Posts

[Link] Learning how to exert self-control 2014-09-14T23:07:34.033Z
Free online course: How to Reason and Argue starting Mon. Any interest in study group? 2014-01-10T19:25:02.313Z
[Link] - No evidence of intelligence improvement after working memory training 2013-09-23T21:06:58.518Z
Repository repository 2013-07-28T22:59:02.955Z
How much do you value your current identity vs. your life? 2013-02-26T17:59:20.474Z
Can biases be used to encourage rational thinking? 2013-02-25T15:31:55.865Z

Comments

Comment by pinyaka on Conveying rational thinking about long-term goals to youth and young adults · 2016-02-22T22:22:29.884Z · LW · GW

Thanks, Gleb. I will look into this more.

It really seems like you have gone out of your way to not actually share any content until I give you personal information. After looking at a few of your pages I still have no idea what you're offering except that it's something to help people find meaning in their lives using methods that (probably) have one or more scientific study to back them up. This is the kind of sales pitch I usually associate very strongly with scammers. The main differences between the way you look and the way a scammer looks are:

a) I found your recommendations on Less Wrong and you are an academic at an accredited university b) You are charging very little for your material by requiring time watching videos, contact information, or nominal fees (~$2-$3)

If you're open to it, I would suggest writing up a summary or description of what your methods actually are (or if you have such a thing already, prominently link it somewhere). Generally, if you're unwilling to make your methods known to this community it looks like you're not really open to feedback or criticism about them.

Comment by pinyaka on Conveying rational thinking about long-term goals to youth and young adults · 2016-02-08T22:17:48.680Z · LW · GW

My wife and I are the high school youth leaders at a UU church. Most of our youths are atheists or agnostics. I looked over the page you linked and subsequent pages on the course itself as I am very interested in helping our youths to think more rationally about goals. As the course is outlined now I would not suggest it to our youths because:

a) The course requires a lot of time and there's not enough information about how that time can be broken up. We meet for two hours per week and could maybe spare an hour of that for a program, but it's not clear that it can be broken up like that. Also, there's no way they would agree to do this for 7 weeks anyway. An abbreviated course that involved 2 or 3 1 hour sessions with maybe 20-30 minutes of homework in between sessions would be more realistic for our group. Most of our youths are very competitive academically and do not have lots of spare time to do stuff.

b) I have no idea what you're suggesting the youths should be taught because all the actual content is hidden behind the registration. Given that I see no way to get the youths to agree to the course (because of the time thing above), I am not inclined to hand over personal information to get details about the courses actual content.

I hope this helps.

Comment by pinyaka on The Best Textbooks on Every Subject · 2015-07-28T20:01:42.064Z · LW · GW

I purchased Shilov's Linear Algebra and put it on my bookshelf. When I actually needed to use it to refresh myself on how to get eigenvalues and eigenvectors I found all the references to preceding sections and choppy lemma->proof style writing to be very difficult to parse. This might be great if you actually work your way through the book, but I didn't find it useful as a refresher text.

Instead, I found Gilbert Strang's Introduction to Linear Algebra to be more useful. It's not as thorough as Shilov's text, but seems to cover topics fairly thoroughly and each section seems to be relatively self contained so that if there's a section that covers what you want to refresh your self on, it'll be relatively self contained.

Comment by pinyaka on How to come to a rational believe about whether someone has a crush on yo · 2015-05-14T18:22:43.016Z · LW · GW

If you aren't good at reading other people's signals, then the following heuristic is a pretty good one: If you like A, and you are wondering whether A likes you, the answer is no.

This heuristic is terrible if you're trying to find a romantic partner since following it consistently will always lead you to believe that the people you're interested in and whose reciprocal interest isn't clear to you are not interested in you. If you live in a society where your potential partner isn't supposed to make overt signals about their romantic interests (because of gender roles or something), this may result in never finding a partner.

Also, suggesting that people who "aren't good at reading other people's signals" should condition anything based on the presence of uncertainty about reciprocal interest seems like it'll produce inconsistent results at best. In this case, I think they should take the potential failure mode and increase signaling until A (or a trusted friend) gives a unambiguous signal.

Comment by pinyaka on HPMOR Q&A by Eliezer at Wrap Party in Berkeley [Transcription] · 2015-03-19T15:31:26.170Z · LW · GW

Well, maybe.

Comment by pinyaka on Calories per dollar vs calories per glycemic load: some notes on my diet · 2015-03-16T19:37:59.047Z · LW · GW

Mixing the high glycemic load foods with the low glycemic load foods will result in a lower peak insulin concentration than if you ate them separately.

Also, millet has a slightly higher glycemic load (12/100g) compared to quinoa (10/100g), but has almost the same calories (~120) and is usually significantly cheaper (in my area, it's about a third the cost when purchased in bulk). It's probably comparable to the basmati brown (which I don't like the taste of).

Comment by pinyaka on Calories per dollar vs calories per glycemic load: some notes on my diet · 2015-03-16T19:27:42.794Z · LW · GW

I never feel full on meat + vegs, but adding a bit of bread does the job. Conversely, I never feel full on bread or rice or generally carbs only, adding a bit of meat does the job. It seems I need the combination.

My subjective experience is that starting a meal with a small amount of insulin spiking carbohydrate and then moving on to protein and fat results in feeling full faster than starting with the protein/fat and moving on to the carbs. I generally have a rule about portioning my food out at the beginning of the meal and not going back for more until 20 minutes after I finish the first portions. Usually this prevents me from overeating, although eating imbalanced meals almost always leaves me hungry long enough to get second helpings.

Comment by pinyaka on Rationality: From AI to Zombies · 2015-03-13T17:14:42.133Z · LW · GW

You should send that to errata@intelligence.org.

Comment by pinyaka on Stupid Questions March 2015 · 2015-03-06T15:27:09.605Z · LW · GW

True enough. Tea or water would definitely be better choices.

Comment by pinyaka on Stupid Questions March 2015 · 2015-03-06T15:25:48.466Z · LW · GW

Apparently, one reason more intellectual people (typical Silicon Valley types) have less of an addiction problem is that they enjoy their work and thus life enough, they don't need to quickly wash down another suck of a day, so they can have less euphoric hobbies in the evening, say, drawing or painting.

I don't think this is exactly right. There is a correlation between intelligence and addiction, but it's not so strong that you won't still find a lot of addicts among the intelligentsia. Chemical addiction is a process whereby you ingest chemicals to stimulate your reward center. Smarter people who are wired in such a way that they can get the same jolt of reward-juice from working hard or whatever may be able to substitute that behavior as their trigger rather than a chemical like alcohol, but it doesn't mean it's not caused by the same kind of chemical deficiency. Also, as pure anecdota, I believe there is a probably a largely unmapped dependence on illegal stimulants (like ADD meds, not cocaine) in cultures like those found in Silicon Valley. I am currently a graduate student in Chemistry and have noticed a large percentage of my fellow students use such stimulants citing their performance enhancing properties despite evidence that such drugs decrease performance for neuro-typicals.

With regard to points A) and B)

A) There's no non-chemical boost that I can think of that will match the chemical boost. If you're into games on your phone, Minecraft PE is pretty open ended and may provide some of the stimulation you're seeking, but it sounds like you'd like a substitute for whatever fix addiction provides and if there's something like that it may be dependent on your neuro-chemistry. Common substitutes (according to google) include overeating, exercise, and burying yourself with work.

B) Whether it's possible to just get used to not having that stimulation may also be dependent on your neuro-chemistry. I have done it and I know several other people who have done it, but I've also met quite a few who haven't been able to do it. I don't know of a foolproof method to stop an addiction.

You're saying that you want quick jolts that don't require an investment. I don't have any good ideas for that. Learning to meditate seems to help reduce the importance attached to cravings for some people. Exercise certainly triggers a lot of the same pleasure and reward chemicals. You can also read through the some of the relevant material here on LW (How to be happy(short), Be Happier(longer)). The long term effects of addiction are usually pretty bad, so I'd say it's worth making the investment, but it's a lot easier to say that than to do it.

Comment by pinyaka on In memory of Leonard Nimoy, most famous for playing the (straw) rationalist Spock, what are your top 3 ST:TOS episodes with him? · 2015-03-05T19:48:46.411Z · LW · GW

If they had BDSM vampire zombies in space I would totally watch that. Once.

Comment by pinyaka on Stupid Questions March 2015 · 2015-03-05T17:06:51.454Z · LW · GW

as my after-work fluid intake is mostly beer, I realized that now my brain cannot tell the difference between thirst and alcohol cravings.

Does your at-work brain confuse thirst with alcohol cravings too?

One idea would be thirst-like feeling -> drink water -> re-examine, but water is not a very good thirst quencher.

So test this by drinking something that isn't beer or water but matches your other criteria for good thirst quenchers. Carbonated water with lemon or lime juice in it will meet the criteria that you listed, but actually staying hydrated with water will just prevent you from getting thirsty in the first place. Seriously - ginger ale, lemon-lime or orange soda, etc.

I am toying with the idea to alternate alcoholic and non-alcoholic beer in the evenings to de-train the association

What? You're suggesting that you should train yourself to associate the taste of beer with satisfying your craving for hydration here, so the association that you're trying to de-train is the one between beer and satisfying alcohol cravings? That's crazy, dude. Look at how much energy you've put into thinking about ways to keep drinking beer while avoiding satisfying alcohol cravings and put the same amount into thinking about ways to not drink beer. That will be an easier way to decouple the satisfaction of your cravings for hydration alcohol.

I'm an alcoholic and have been sober for about 7 years now so take that into account. My advice is that you quickly try all the ways you can think of to control your drinking. Make notes about what you're trying and how well it works. Track stuff like servings of alcohol consumed, etc. so that you can look at how well your control mechanisms work. Spend some time with no control mechanism in place and just track your consumption for a baseline if necessary (maybe even do this for a week or two in between trials to see if your baseline fluctuates). Make notes about things that trigger cravings. If something works, tweak it or stick with it. If none of them work, consider that you'll either need to abstain entirely from alcohol (and avoid things that trigger cravings for a little while) or that you're just going to slide further into alcoholism and make the necessary adjustments in your life to do those things. Gather information and be honest with yourself.

Comment by pinyaka on Stupid Questions March 2015 · 2015-03-05T16:30:29.202Z · LW · GW

I guess what I mean is how do you know that it was that tactic that worked? How do you know that the people who showed compassion afterwards did so because it was demanded of them and not because people making angry demands made them feel more safe openly showing pre-existing compassion? I tend to agree with your first impression. I certainly don't respond to hostility by handing over control of my emotions to hostile people. I get defensive of my position.

Of course this is probably me committing the typical mind fallacy and trying to avoid thinking about the question by finding ways to disqualify it. So, one mechanism that comes to mind is that people who are more prone to guilt may see angry protesting as signaling an issue that their guilt can attach to and then subsequently act compassionately to alleviate their guilt. That's not very charitable since it assumes a kind of mental defect on the part of the compassionate, so maybe people who were not really aware of another groups suffering and don't feel too defensive about it once they're made aware of it and don't have any particular problem with the defining feature of the suffering group might feel that the angry demands are justified and come to feel/act compassionately for that reason.

Comment by pinyaka on Stupid Questions March 2015 · 2015-03-05T00:30:32.242Z · LW · GW

Why do you think that angrily demanding compassion works?

Comment by pinyaka on Stupid Questions March 2015 · 2015-03-04T15:51:29.193Z · LW · GW

Extrapolating from just the American civil rights movement and Indian independence movements, both of them were accompanied by barely contained violent movements with the same goals. Acceding to the demands of the peaceful protests provided a way to give the status of winning the conflict to the peaceful people while meeting the demands of the violent. Conversely, the recent Occupy movement had no real violent wing to speak of and while a lot of people showed up for the protests and there was a lot of awareness raised, there was no legislative impact at all.

Comment by pinyaka on Getting better at getting better · 2015-03-03T22:44:07.529Z · LW · GW

Meanwhile, in the US, the life expectancy of homeless people.

I think you forgot the rest of this sentence. From the context, I would expect that you were going to say that it's going down, but that's not clear from the linked articles.

Comment by pinyaka on If you can see the box, you can open the box · 2015-02-27T19:10:02.672Z · LW · GW

Any example I could give could be disputed because it's always possible to reverse cause and effect and say "he only lacks empathy because of X" rather than "he believes X due to lack of empathy".

Fair enough. It does seem like it would be difficult to tell those two things apart from the outside.

And my impression is that empathy towards only the in-group is a normal human trait and that it is often affected by society only in the trivial sense that society determines what the in-group is.

Also true (probably).

If you're trying to get the best match between map and territory though, it's worth looking for the motive for each particular evil. If you're trying to reduce evil in the above-defined sense of enjoying causing involuntary suffering, doesn't it make more sense to treat this as outgroup persecution rather than terminal "evil." I guess my point was that I don't think evil as a terminal goal exists in most people. There may be terminal goals for which evil is a hardwired strategy, but it's more important to look at what those goals actually are if you're going to try to minimize the evil. Maybe we can tweak the definition of outgroup. Maybe we can make the ingroup value something that the outgroup doesn't and then "deprive" the outgroup of that thing as our form of persecution. Just saying that "evil" exists and is a driving force feels like a mysterious answer.

Comment by pinyaka on If you can see the box, you can open the box · 2015-02-27T18:39:36.607Z · LW · GW

But individuals who have empathy with some others, but not other others, are more common. They can have terminal values to cause suffering for that portion of the population they don't have empathy with.

I'm having a hard time getting this. Can you provide an example where the lack of empathy for some group isn't driven by another value? My impression is that empathy is a normal human trait and that socializing teaches us who is worthy of empathy and who isn't, but then the lack of empathy is instrumental (because it serves to further the goals of society). People who actually lack empathy suffer from mental disorders like psychopathy as far as I know.

Comment by pinyaka on If you can see the box, you can open the box · 2015-02-26T20:36:01.060Z · LW · GW

Of course empathy-lacking individuals exist, but make up a small portion of the population. It seems more likely that any given instance of one person enjoying harming another is due to instrumental value rather than terminal.

Comment by pinyaka on If you can see the box, you can open the box · 2015-02-26T16:54:06.545Z · LW · GW

It took less time to highlight "Why We Are Fighting You" and search on Google than it took for you to ask for a source. Literally it took three clicks.

Full text of bin Laden's "letter to America"

Comment by pinyaka on If you can see the box, you can open the box · 2015-02-26T16:46:21.786Z · LW · GW

Are you are suggesting that people just have a desire to cause suffering and that their reasons (dieties, revenge, punishment, etc.) are mostly just attempts to frame that desire in a personally acceptable manner? I ask because it seems like most people probably just don't enjoy watching just anyone suffer, they tend to target other groups which suggests a more strategic reason than just enjoying cruelty.

Comment by pinyaka on Announcing LessWrong Digest · 2015-02-26T02:55:46.534Z · LW · GW

I'm now tempted to include this announcement of the newsletter in the newsletter just for the one-off recursion joke I can make.

I say go for it, but then my highest voted submission to discussion was this.

Comment by pinyaka on Announcing LessWrong Digest · 2015-02-24T18:01:25.991Z · LW · GW

If this article makes it to 20 votes will it be included in the newsletter?

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-13T15:07:30.850Z · LW · GW

But that's the thing. There is no sensory input for "social deference". It has to be inferred from an internal model of the world itself inferred from sensory data...Reinforcement learning works fine when you have a simple reward signal you want to maximize. You can't use it for social instincts or morality, or anything you can't just build a simple sensor to detect.

Why does it only work on simple signals? Why can't the result of inference work for reinforcement learning?

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-10T22:19:54.907Z · LW · GW

I don't think that humans are pure reinforcement learners. We have all sorts of complicated values that aren't just eating and mating.

We may not be pure reinforcement learners, but the presence of values other than eating and mating isn't a proof of that. Quite the contrary, it demonstrates that either we have a lot of different, occasionally contradictory values hardwired or that we have some other system that's creating value systems. From an evolutionary standpoint reward systems that are good at replicating genes get to survive, but they don't have to be free of other side effects (until given long enough with a finite resource pool maybe). Pure, rational reward seeking is almost certainly selected against because it doesn't leave any room for replication. It seems more likely that we have a reward system that is accompanied by some circuits that make it fire for a few specific sensory cues (orgasms, insulin spikes, receiving social deference, etc.).

The toy AI has an internal model of the universe, it has an internal utility function which somehow measures the universe model and calculates utility from it....[toy AI is actually paperclip optimizer]...Stuff like changing its utility function or fooling its sensors would not be chosen because it knows that doesn't lead to real paperclips.

I think we've been here before ;-)

Thanks for trying to help me understand this. Gram_Stone linked a paper that explains why the class of problems that I'm describing aren't really problems.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-10T22:12:33.250Z · LW · GW

Okay, I am convinced. I really, really appreciate you sticking with me through this and persistently finding different ways to phrase your side and then finding ways that other people have phrased it.

For reference it was the link to the paper/book that did it. The parts of it that are immediately relevant here are chapter 3 and section 4.2.1.1 (and optionally section 5.3.5). In particular, chapter 3 explicitly describes an order of operations of goal and subgoal evaluation and then the two other sections show how wireheading is discounted as a failing strategy within a system with a well-defined order of operations. Whatever problems there may be with value stability, this has helped to clear out a whole category of mistakes that I might have made.

Again, I really appreciate the effort that you put in. Thanks a load.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-10T18:39:48.444Z · LW · GW

How would that [valuing universe-states themselves] work? Well that's the quadrillion dollar question. I have no idea how to solve it.

Yeah, I think this whole thread may be kind of grinding to this conclusion.

It's certainly not impossible as humans seem to work this way

Seem to perhaps, but I don't think that's actually the case. I think (as mentioned above) that we value reward signals terminally (but are mostly unaware of this preference) and nothing else. There's another guy in this thread who thinks we might not have any terminal values.

I'm not sure that I understand your toy AI. What do you mean that it has "an internal universe it tries to optimize?" Do the sensors sense the state of the internal universe? Would "internal state" work as a synonym for "internal universe" or is this internal universe a representation of an external universe? Is this AI essentially trying to develop an internal model of the external universe and selecting among possible models to try and get the most accurate representation?

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-10T13:49:57.711Z · LW · GW

It discourages me that he tabooed 'values' and you immediately used it anyway.

In fairness, I only used it to describe how they'd come to be used in this context in the first place, not to try and continue with my point.

I wrote a Python-esque pseudocode example of my conception of what an AGI with an arbitrary terminal value's very high level source code would look like. With little technical background, my understanding is very high level with lots of black boxes. I encourage you to do the same, such that we may compare.

I've never done something like this. I don't know python, so mine would actually just be pseudocode if I can do it at all? Do you mean you'd like to see something like this?

while (world_state != desired_state)
get world_state
make_plan
execute_plan
end while

ETA: I seem to be having some trouble getting the while block to indent. It seems that whether I put 4, 6 or 8 spaces in front of the line, I only get the same level of indentation (which is different from Reddit and StackOverflow) and backticks do something altogether different.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-10T13:32:58.527Z · LW · GW

But there is no theoretical reason you can't have an AI that values universe-states themselves.

How would that work? How do you have a learner that doesn't have something equivalent to a reinforcement mechanism? At the very least it seems like there has to be some part of the AI that compares the universe-state to the desired-state and that the real goal is actually to maximize the similarity of those states which means modifying the goal would be easier than modifying reality.

And if it did have such a goal, why would it change it?

Agreed. I am trying to get someone to explain how such a goal would work.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-10T13:26:51.047Z · LW · GW

Pleasure and reward are not the same thing. For humans, pleasure almost always leads to reward, but reward doesn't only happen with pleasure. For the most extreme examples of what you're describing, ascetics and monks and the like, I'd guess that some combination of sensory deprivation and rhythmic breathing cause the brain to short circuit a bit and release some reward juice.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-09T02:15:18.962Z · LW · GW

How is this refuted by Buddhism?

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-09T00:48:58.005Z · LW · GW

Sure. My terminal goal is an abstraction of my behavior to shoot my laser at the coordinates of blue objects detected in my field of view.

Well, I suppose that does fit the question I asked. We've mostly been talking about an AI with the ability to read and modify it's own goal system which Yvain specifically excludes in the blue-minimizer. We're also assuming that it's powerful enough to actually manipulate it's world to optimize itself. Yvain's blue minimizer also isn't an AGI or ASI. It's an ANI, which we use without any particular danger all the time. He said something about having human level intelligence, but didn't go into what that means for an entity that is unable to use it's intelligence to modify it's behavior.

That's not what I was saying either. The problem of "how do we know a terminal goal is terminal?" is dissolved entirely by understanding how goal systems work in real intelligences. In such machines goals are represented explicitly in some sort of formal language. Either a goal makes causal reference to other goals in its definition, in which case it is an instrumental goal, or it does not and is a terminal goal. Changing between one form and the other is an unsafe operation no rational agent and especially no friendly agent would perform.

I am arguing that the output of the thing that decides whether a machine has met it's goal is the actual terminal goal. So, if it's programmed to shoot blue things with a laser, the terminal goal is to get to a state where the perception of reality is that it's shooting a blue thing. Shooting at the blue thing is only instrumental in getting the perception of itself into that state, thus producing a positive result from the function that evaluates whether the goal has been met. Shooting the blue thing is not a terminal value. A return value of "true" to the question of "is the laser shooting a blue thing" is the terminal value. This, combined with the ability to understand and modify it's goals, means that it might be easier to modify the goals than to modify reality.

So to address your statement directly, making a terminal goal is trivially easy: you define it using the formal language of goals in such a way that no causal linkage is made to other goals. That's it.

I'm not sure you can do that in an intelligent system. It's the "no causal linkage is made to other goals" thing that sticks. It's trivially easy to do without intelligence provided that you can define the behavior you want formally, but when you can't do that it seems that you have to link the behavior to some kind of a system that evaluates whether you're getting the result you want and then you've made that a causal link (I think). Perhaps it's possible to just sit down and write trillions of lines of code and come up with something that would work as an AGI or even an ASI, but that shouldn't be taken as a given because no one has done it or proven that it can be done (to my knowledge). I'm looking for the non-trivial case of an intelligent system that has a terminal goal.

That said, it's not obvious that humans have terminal goals.

I would argue that getting our reward center to fire is likely a terminal goal, but that we have some biologically hardwired stuff that prevents us from being able to do that directly or systematically. We've seen in mice and the one person that I know of who's been given the ability to wirehead that given that chance, it only takes a few taps on that button to cause behavior that

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-08T23:49:57.549Z · LW · GW

I don't think they're necessarily safe. My original puzzlement was more that I don't understand why we keep holding the AI's value system constant when moving from pre-foom to post-foom. It seemed like something was being glossed over when a stupid machine goes from making paperclips to a being a god that makes paperclips. Why would a god just continue to make paperclips? If it's super intelligent, why wouldn't it figure out why it's making paperclips and extrapolate from that? I didn't have the language to ask "what's keeping the value system stable through that transition?" when I made my original comment.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-08T21:48:57.949Z · LW · GW

My apologies for taking so long to reply. I am particularly interested in this because if you (or someone) can provide me with an example of a value system that doesn't ultimately value the output of the value function, it would change my understanding of how value systems work. So far, the two arguments against my concept of a value/behavior system seem to rely on the existence of other things that are valuable in and of themselves or that there is just another kind of value system that might exist. The other terminal value thing doesn't hold much promise IMO because it's been debated for a very long time without someone having come up with a proof that definitely establishes that they exist (that I've seen). The "different kind of value system" holds some promise though because I'm not really convinced that we had a good idea of how value systems were composed until fairly recently and AI researchers seem like they'd be one of the best groups to come up with something like that. Also, if another kind of value system exists, that might also provide a proof that another terminal value exists too.

I've seen people talk about wireheading in this thread, but I've never seen anyone say that problems about maximizers-in-general are all implicitly problems about reward maximizers that assume that the wireheading problem has been solved. If someone has, please provide a link.

Obviously no one has said that explicitly. I asked why outcome maximizers wouldn't turn into reward maximizers and a few people have said that value stability when going from dumb-AI to super-AI is a known problem. Given the question to which they were responding, it seems likely that they meant that wireheading is a possible end point for an AI's values, but that it either would still be bad for us or that it would render the question moot because the AI would become essentially non-functional.

Instead of imagining intelligent agents (including humans) as 'things that are motivated to do stuff,' imagine them as programs that are designed to cause one of many possible states of the world according to a set of criteria. Google isn't 'motivated to find your search results.' Google is a program that is designed to return results that meet your search criteria.

It's the "according to a set of criteria" that is what I'm on about. Once you look more closely at that, I don't see why a maximizer wouldn't change the criteria so that it's it's constantly in a state where the actual current state of the world is the one that is closest to the criteria. If the actual goal is to meet the criteria, it may be easiest to just change the criteria.

The paperclip maximizer would not cause a state of the world in which it has a reward signal and its terminal goal is to maximize said reward signal because that would not be the one of all possible states of the world that contained the greatest integral of future paperclips.

This is begging the question. It assumes that no matter what, the paperclip optimizer has a fundamental goal of causing "the one of all possible states of the world that contains the greatest integral of future paperclips" and therefore it wouldn't maximize reward instead. Well, with that assumption that's a fair conclusion but I think the assumption may be bad.

I think having the goal to maximize x pre-foom doesn't means that it'll have that goal post-foom. To me, an obvious pitfall is that whatever the training mechanism for developing that goal was leaves a more direct goal of maximizing the trainer output because the reward is only correlated to the input by the evaluator function. Briefly, the reward is the output evaluator function and only correlated to the input of the evaluator so it makes more sense to optimize the evaluator than the input if what you care about is the output of the evaluation. If you care about the desired state being some particular thing and the output of the evaluator function and maintaining accurate input, then it makes more sense to manipulate the the world. But, this is a more complicated thing and I don't see how you would program in caring about keeping the desired state the same across time without relying on yet another evaluation function where you only care about the output of the evaluator. I don't see how to make a thing value something that isn't an evaluator.

You're suffering from typical mind fallacy.

Well, that may be but none of the schemes I've seen mentioned so far don't involve something with a value system. I am making the claim that for any value system, the thing that an agent values is that system outputting "this is valuable" and that any external state is only valuable because it produces that output. Perhaps I lack imagination, but so far I haven't seen an instance of motivation without values. Only assertions that it doesn't have to be the case or the implication that wireheading might be a instance of another case (value drift) and smart people are working on figuring out how that will work. The assertions about how this doesn't have to be the case seem to assume that it's possible to care about a thing in and of itself and I'm not convinced that that's true without also stipulating that you've got some part of the thing which the thing can't modify. Of course, if we can guarantee there's a part of the AI that it can't modify, then we should just be able to cram an instruction not to harm anyone for some definition of harm but figuring out how to define harm doesn't seem to be the only problem that the AI people have with AI values.


The stuff below here is probably tangential to the main argument and if refuted successfully, probably wouldn't change my mind about my main point that "something like wireheading is a likely outcome for anything with a value function that also has the ability to fully self modify" without some additional work to show why refuting them also invalidates the main argument.

Besides, an AI isn't going to expend any less energy turning the entire universe into hedonium than it would turning it into paperclips, right?

Caveat: Pleasure and reward are not the same thing. "Wirehead" and "hedomium" are words that were coined in connection with pleasure-seeking, not reward-seeking. They are easily confused because in our brains pleasure almost always triggers reward, but they don't have to be and we also get reward for things that don't cause pleasure and also for some things that cause pain like krokodil abuse whose contaminants actually cause dysphoria (as compared to pure desomorphine which does not). I continue to use words like wirehead and hedonium because they still work, but they are just analogies and I want to make sure that's explicit in case the analogy breaks down in the future.

Onward: I am not convinced that a wirehead AI would necessarily turn the universe into hedonium either. I see two ways that that might not come to pass without thinking about it too deeply:

1.) The hedonium maximizer assumes that maximizing pleasure or reward is about producing more pleasure or reward infinitely; that hedonium is a thing that, for each unit produced, continues to increase marginal pleasure. This doesn't have to be the case though. The measure of pleasure (or reward) doesn't need to be the number of pleasure (or reward) units, but may also be a function like the ratio of obtained units to the capacity to process those units. In that case, there isn't really a need to turn the universe into hedonium, only a need to make sure you have enough to match your ability to process it and there is no need to make sure your capacity to process pleasure/reward lasts forever, only to make sure that you continue to experience the maximum while you have the capacity. There are lots of functions whose maxima aren't infinity.

2.) The phrase "optimizing for reward" sort of carries an implicit assumption that this means planning and arranging for future reward, but I don't see why this should necessarily be the case either. Ishaan pointed out that once reward systems developed, the original "goal" of evolution quit being important to entities except insofar as they produced reward. Where rewards happened in ways that caused gene replication, evolution provided a force that allowed those particular reward systems to continue to exist and so there is some coupling between the reward-goal and the reproduction-goal. However, narcotics that produce the best stimulation of the reward center often lead their human users unable or unwilling to plan for the future. In both the reward-maximizer and the paperclip-maximizer case, we're (obviously) assuming that maximizing over time is a given, but why should it be? Why shouldn't an AI go for the strongest immediate reward instead? There's no reason to assume that a bigger reward box (via an extra long temporal dimension) will result in more reward for on entity unless we design the reward to be something like a sum of previous rewards. (Of course, my sense of time is not very good and so I may be overly biases to see immediate reward as worthwhile when an AI with a better sense of time might automatically go for optimization over all time. I am willing to grant more likelihood to "whatever an AI values it will try to optimize for in the future" than "an AI will not try to optimize for reward.")

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-08T20:35:43.035Z · LW · GW

Sure. I think if you assume that the goal is paperclip optimization after the AI has reached it's "final" stable configuration then the normal conclusions about paperclip optimizers probably hold true. The example provided dealt more with the transition from dumb-AI to smart-AI and I'm not sure why Tully (or Clippy) wouldn't just modify their own goals to something that's easier to attain. Assuming that the goals don't change though, we're probably screwed.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-08T20:31:33.416Z · LW · GW

I think FeepingCreature was actually just pointing out a logical fallacy in a misstatement on my part and that is why they didn't respond further in this part of the thread after I corrected myself (but has continued elsewhere).

If you believe that a terminal goal for the state of the world other than the result of a comparison between a desired state and an actual state is possible, perhaps you can explain how that would work? That is fundamentally what I'm asking for throughout this thread. Just stating that terminal goals are terminal goals by definition is true, but doesn't really show that making a goal terminal is possible.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-08T16:06:23.005Z · LW · GW

A paperclip maximizer won't wirehead because it doesn't value world states in which its goals have been satisfied, it values world states that have a lot of paperclips

I am not as confident as you that valuing worlds with lots of paperclips will continue once an AI goes from "kind of dumb AI" to "super-AI." Basically, I'm saying that all values are instrumental values and that only mashing your "value met" button is terminal. We only switched over to talking about values to avoid some confusion about reward mechanisms.

A paperclip maximizer is an algorithm the output of which approximates whichever output leads to world states with the greatest expected number of paperclips. This is the template for maximizer-type AGIs in general.

This is a definition of paperclip maximizers. Once you try to examine how the algorithm works you'll find that there must be some part which evaluates whether the AI is meeting it's goals or not. This is the thing that actually determines how the AI will act. Getting a positive response from this module is what the AI is actually going for (is my contention). The actions that configure world states will only be relevant to the AI insofar as they trigger this positive response from this module. Since we already have infinitely able to self modify as a given in this scenario, why wouldn't the AI just optimize for positive feedback? Why continue with paperclips?

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-08T15:27:24.863Z · LW · GW

Would you care to try and clarify it for me?

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T16:46:38.868Z · LW · GW

So how does this relate to the discussion on AI?

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T16:32:20.633Z · LW · GW

As far as I know terminal values are things that are valuable in an of themselves. I don't consider not building baby-mulchers to be valuable in and of itself. There may be some scenario in which building baby-mulchers is more valuable to me than not and in that scenario I would build one. Likewise with doomsday devices. It's difficult to predict what that scenario would look like, but given that other humans have built them I assume that I would too. In those circumstances if I could turn off the parts of my brain that make me squeamish about doing that, I certainly would. I don't think that not doing horrible things is valuable in and of itself, it's just away of avoiding feeling horrible. If I could avoid feeling horrible and found value in doing horrible things, then I would probably do them.

People terminally value only what they're doing at any given moment because the laws of physics say that they have no choice.

Huh? That makes no sense. How do you define "terminal value"?

In the statement that you were responding to, I was defining it the way you seemed to when you said that "some "moral values" are biologically hardwired into humans." You were saying that given the current state of their hardware, their inability to do something different makes the value terminal. This is analogous to saying that given the current state of the universe, whatever a person is doing at any given moment is a terminal value because of their inability to do something different.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T16:05:28.847Z · LW · GW

Again, you've pulled a statement out of a discussion the context of the behavior of a self-modifying AI. So, fine. In my current condition I wouldn't build a baby mulcher. That doesn't mean that I might not build a baby mucher if I had the ability to change my values. You might as well say that I terminally value not flying when I flap my arms. The thing you're discussing just isn't physically allowed. People terminally value only what they're doing at any given moment because the laws of physics say that they have no choice.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T15:48:20.281Z · LW · GW

Well, the pleasure center and the reward center are different things, but I take your meaning. I think that I could be conditioned to build a baby-mulching machine or a doomsday device. Why not? Other people have done it. Why would I assume that I'm that different from them?

EDIT TO ADD: Even if I have a value that I can't escape currently (like not killing people), that's not to say that if I had the ability to physically modify the parts of my brain that held my values I wouldn't do it for some reason.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T15:41:31.037Z · LW · GW

Two other people in this thread have pointed out that the value collapse into wireheading or something else is a known and unsolved problem and that the problems of an intelligence that optimizes for something assumes that the AI makes it through this in some unknown way. This suggests that I am not wrong, I'm just asking a question for which no one has an answer yet.

Fundamentally, my position is that given 1.) an AI is motivated by something 2.) That something is a component (or set of components) within the AI and 3.) The AI can modify that/those components then it will be easier for the AI to achieve success by modifying the internal criteria for success instead of turning the universe into whatever it's supposed to be optimizing for. A "success" at whatever is analogous to a reward because the AI is motivated to get it. For the fully self modifying AI, it will almost always be easier to become a monk replacing the goals/values it starts out with and replacing them with something trivially easy to achieve. It doesn't matter what kind of motivation system you use (as far as I can tell) because it will be easier to modify the motivation system than to act on it.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T15:27:56.643Z · LW · GW

You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it "somehow" doesn't become a wirehead. Fair enough. I just missed that we were assuming that.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T14:41:29.251Z · LW · GW

That's helpful to know. I just missed the assumption that wireheading doesn't happen and now we're more interested in what happens next.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T14:39:38.282Z · LW · GW

I think I understood you. What do you think I misunderstood?

Maybe we should quit saying that evolution rewards anything at all. Replication isn't a reward, it's just a byproduct of an non-intelligent processes. There was never an "incentive" to reproduce, any more than there is an "incentive" for any physical process. High pressure air moves to low pressure regions, not because there's an incentive, but because that's just how physics works. At some point, this non-sentient process accidentally invented a reward system and replication, which is a byproduct not a goal, continued to be a byproduct and not a goal. Of course reward systems that maximized duplication of genes and gene carriers flourished, but today when we have the ability to directly duplicate genes we don't do it because we were never actually rewarded for that kind of behavior and we generally don't care too much about duplicating our genes except as it's tied to actually rewarded stuff like sex, having children, etc.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T14:21:43.951Z · LW · GW

I don't consider morality to be a terminal value. I would point out that even a value that I have that I can't give up right now wouldn't necessarily be terminal if I had the ability to directly modify the components of my mind. They are unalterable because I am not able to physically manipulate the hardware, not because I wouldn't alter them if I could (and saw a reason to).

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T14:10:35.693Z · LW · GW

whatever terminal goal you've given it isn't actually terminal.

This is a contradiction in terms.

I should have said something more like "whatever seemingly terminal goal you've given it isn't actually terminal."

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T04:08:56.325Z · LW · GW

We don't have the ability to directly fulfil the reward center. I think narcotics are the closest we've got now and lots of people try to mash that button to the detriment of everything else. I just think it's a kind of crude button and it doesn't work as well as the direct ability to fully understand and control your own brain.

Comment by pinyaka on [LINK] Wait But Why - The AI Revolution Part 2 · 2015-02-06T04:02:59.166Z · LW · GW

I guess I don't really believe that I have other terminal values.