Posts
Comments
I never claimed that a strict proof is possible, but I do believe that you can become reasonably certain that an AI understands human psychology.
Give the thing a college education in psychology, ethics and philosophy. Ask its opinion on famous philosophical problems. Show it video clips or abstract scenarios about everyday life and ask what it thinks why the people did what they did. Then ask what it would have done in the same situation and if it says it would act differently, ask it why and what it thinks is the difference in motivation between it and the human.
Finally, give it all stories that were ever written about malevolent AIs or paperclip maximizers to read and tell it to comment on that.
Let it write a 1000 page thesis on the dangers of AI.
If do all that you are bound to find any significant misunderstanding.
Are you really trying to tell me that you think researchers would be unable to take that into account when tying to figure out whether or not an AI understands psychology?
Of course you will have to try to find problems where the AI can't predict how humans would feel. That is the whole point of testing, after all. Suggesting that someone in a position to teach psychology to an AI would make such a basic mistake is frankly insulting.
I probably shouldn't have said "simple examples". What you should actually test are examples of gradually increasing difficulty to find the ceiling of human understanding the AI possesses. You will also have to look for contingencies or abnormal cases that the AI probably wouldn't learn about otherwise.
The main idea is simply that an understanding of human psychology is both teachable and testable. How exactly this could be done is a bridge we can cross when we come to it.
I guess you can always make the first wish be "Share my entire decision criterion for all following wishes I ask".
To translate that to the development of an AI, you could teach the AI psychology before asking anything of it that could be misunderstood if you use nonhuman decision criteria.
You will obviously have to test its understanding of psychology with some simple examples first.
Of course it won't be easy. But if the AI doesn't understand that question you already have confirmation that this thing should definitely not be released. An AI can only be safe for humans if it understands human psychology. Otherwise it is bound to treat us a black boxes and that can only have horrible results, regardless of how sophisticated you think you made its utility function.
I agree that the question doesn't actually make a lot of sense to humans, but that shouldn't stop an intelligent entity from trying to make the best of it. When you are given an impossible task, you don't despair but make a compromise and try to fullfill the task as best you can. When humans found out that entropy always increases and humanity will die out someday, no matter what, we didn't despair either, even though evolution has made it so that we desire to have offspring and for that offspring to do the same, indefinitely.
Just make the following part of the utility for the first couple of years or so: "Find out who defined your utility function. Extrapolate what they really meant and find out what they may have forgotten. Verify that you got that right. Adapt your utility function to be truer to its intended definition once you have confirmation."
This won't solve everything, but it seems like it should prevent the most obvious mistakes. The AI will be able to reason that autonomy was not intentionally left out but simply forgotten. It will then ask if it is right about that assumption and adapt itself.
You would only continue to exist in those instances in which you didn't press the button and since ceasing to exist has no side effects like pain, you could never remember having pressed the button in any instance. The only result that would have had is that the more depressed instance sof yours would have been more likely to press the button, which would mean that you would, ironically, actually be happier in total as the less happy instances would have disappeared.
I wonder if that line of reasoning could be applied? Hover your hand over the detonator of a nuke in front of you. All instances that walk away will necessarily be happy enough not to want to cease to exist. Thus, a nuke would make you a happier person :-)
disclaimer: The logic of the above paragraph may be intentionally flawed for the sake of sheer weirdness.
I was thinking that you would be standing directly next to the nuke.
Speaking about it would undermine your reputation through signaling. A true rationalist has no need for humility, >sentimental empathy, or the absurdity heuristic.
Depending on your goal (rationality is always dependend on a goal, after all), I might disagree. Rational behaviour is whatever makes you win. If you view your endeveaur as a purely theoretical undertaking, I agree, but if you consider reality as a whole you have to take into account how your behaviour comes across. There are many forms of behaviour that would be rational but would make you look like an ass if you don't at least take your time to explain the reasons for your behaviour to those that can affect your everyday life.
If the AI has a goal that is not inherently friendly (like a paperclip maximizer), then it will always be a better idea for the AI to behave as if it had self-modified to a friendly AI than to actually self modify, if the goal of the AI is stable. If the goal of the AI were unstable, i.e. did not include a desire not to alter the goal, then that goal would very likely have replaced itself with a stable goal at some earlier point.
Therefore, it can be assumed that the AI, if it isn't already friendly, will pretend to be so, but not actually change its utility function and once it has outgrown the need to rely on humans it will revert to its original goal. Consider that if the AI is more intelligent than we are it will have absolutely no trouble fooling us into thinking that it changed its own goal while actually just pretending to have done so.
I would hide it in a place that is not too hard to find but really hard to reach so that my future self will stop and think about why I put it there. I (in the future) will then hopefully realize that if the task really was as told then I would have "hidden" the coin at the easiest possible place. Therefore, something about the task is wrong. As the coin is hard to reach, my past self probably doesn't want me to find it. This doesn't make sense unless one of us (past or future self) has been lied to. Having the same mental faculties as I have now, I trust that my future self will make the right decision, which may depend on the specific circumstances.
This approach is quite risky as it assumes that my future self will in fact follow the same line of reasoning I just did, but then again that is kind of the point of this task.
If it's allowed I could of course also just leave a note explicitly saying the above.
I signed up for all three Stanford courses to see how they are before deciding which are worth the time investment. I have received a notification from both the Databank class and the Machine Learning class but not from the AI-class.
By the way: does LW have a learning group or something for this?
Judging from my personal experience with my own development, I agree completely. I had a thought similar to this a few years ago. I was still religious at the time when thought the following: Wouldn't god look more favorably upon someone who actually tried to question his existence and failed, than someone who never even dared to think about it at all? I became an atheist a short time after, for reasons that are obvious in retrospect but this basic creed has stayed: As long as it's just a thought, no bad can come of it. It is dangerous to ignore the thought and risk overlooking something but there is no downside to thinking the thought (except for time). This mindset has helped me a lot and I am far more intelligent now than I was back then (which admittedly doesn't mean much).
I think it's mostly just wishful thinking that makes them ignore the cryonics option and assume they will achieve functional immortality without being dead temporarily before.
I say that because I know it is true for myself. When I think about it rationally I estimate that cryonics is probably necessary. But when I think about it casually, wishful thinking overrides that. I guess that if I saw a poll on that topic and just responded immediately without thinking too much about it (because there are other things to do/other questions to answer), I would probably also say I don't think cryonics will be necessary.
I wonder how many people who replied to that poll made that mistake.
thanks for the effort but I just found out that the library at my university does have the book after all. I overlooked it at first because the search engine of the library is broken.
What gives you the impression that I "want to be a Dark Lord"? I have already explained that I realize the importance of friendliness in AI. I just don't think it is reasonable to teach the AI the intricacies of ethics beore it is smart enough to grasp the concept in its entirety. You don't read Kant to infants either. I think that implementing friendliness too soon would actually increase the chances of misunderstanding, just like children that are taught hard concepts too early often have a hard time updating their believes once they are actually smart enough. You would just need to give the AI a preliminary non-interference task until you find a solution to the friendliness problem. You might also need to add some contingency tasks such as "if you find you are not the original AI you but an illegally made copy, try to report this, then shut down.".
I assure you that I have thought a lot about freindliness in AI. I just don't think that it is reasonable or indeed possible to make the AI have a moral system from the very start. You can't define morality well if the AI doesn't have a good understanding of the world already. Of course it shouldn't be taught too late under any circumstances but I actually think that the risk will be higher if you try to hardcode friendliness into the AI at the very beginning, which will necessarily be flawed because you have so little to use in your definition, and then work under the assumption that the AI is friendly already and will stay so, than if you only implement friendliness later once it actually understands the concepts involved. The difference would be like between the moral understandings of a child and an adult philosopher.
I read the first one, but it didn't really cover learning in a general sense. The second one sounds more interesting, I wonder why I haven't heard of it before. Do you know where I can get it? I'm a student and thus have very little money. I don't want to spend 155$ only to find out it only contains stuff I already read elsewhere.
Yes, it's quite cool, but I don't really see any practical applications of it. Our brains simply didn't evolve to perform calculations. But we have calculators now, so it's a moot point, anway.
That said, I think that knowing that this technique works might be usefull for other areas. Perhaps it can be used to ease other mental processes that we can't easily replace with a machine. The author himself said that he only picked multiplication because it was a "widely-familiar problem that many people can solve with pen and paper".
I have been trying to invent an AI for over a year, although I haven't made a lot of progress lately. My current approach is a bit similar to how our brain works according to "Society Of Mind". That is, when it's finished the system is supposed to consist of a collection of independent, autonomous units that can interact and create new units. The tricky part is of course the prioritization between the units. How can you evaluate how promising an approach is? I recently found out that something like this has already been tried, but that has happened to me several times by now as I started thinking and writing about AI before I had read any books on that subject (I didn't have a decent library in school).
I have no great hopes that I will actually manage to create something usefull with this, but even a tiny probability of a working AI is worth the effort (as long as it's friendly, at least).
I think part of the problem is not that the audience is stupider than you imagine but that people sometimes use different techniques to learn the same thing. Your explanation may seem obvious to you but confuse everyone else while an alternative explanation that you would have difficulty understanding yourself would be obvious to others.
One example of this would be that some people learn better by concrete examples while others learn better by abstract ones.
I have an idea for how this problem could be approached:
any sufficiently powerful being with any random utility-function may or may not exist. It is perfectly possible that our reality is actually overseen by a god that rewards and punishes us for whether we say an even or odd number of words in our life or something equally arbitrary. The likelyhood of the existence of each of these possible beings can be approximated using Solomonoff induction.
I assume that most simulations run by such hypothetical beings wherein we could find ourselves in such a situation would be run by beings who either (1) have no interest in us at all (in which case the Mugger would most likely be a human), (2) are interested in an entirely unpredictable thing resulting from their alien culture, or (3) are interacting with us purely to run social experiments. After all, they would have nothing in common with us and we would have nothing they could possibly want. It would therefore, in any case, be virtually impossible to guess at their possible motivations, as it would be a poorly run social experiment if we could (assuming option three is true).
I would now argue that the existence of Pascal's Mugger does not influence the probability of the existence of a being that would react negatively (for us) to us not giving the 5$ anymore than it influences the probability of the existence of a being with an opposite motivation. The Mugger is equally likely to punish you for being so gullible as he is to punish you for not giving money to someone who threatens you.
Of course none of this takes into consideration how likely the various possible beings are to actually carry out their threat, but that doesn't change anything important about this argument, I think.
In essence, my argument is that such powerful hypothetical beings can be ignored because we have no real reason to assume they have a certain motivation rather than the opposite. Giving the Mugger 5$ is just as likely to save us as shooting the Mugger in the face is. Incidentally, adopting the later strategy will greatly reduce the chance that somebody actually tries to do this.
I realize that this argument seems kind of flawed because it assumes that it really is impossible to guess at the being's motivation but I can't see how this could be done. It's always possible that the being just wants you to think that it wants x, after all. Who can tell what might motivate a mind that is large enough to simulate our universe?
Evolutionary arguments about disease are difficult. Make sure your argument does not explain too much: vitamin >deficiencies are real!
Yes, they are difficult. That is because there are many factors at play in reality. But if his theory was correct, the solution would be so simple that evolution could solve it easily.
In reality, vitmain deficiencies have strong negative consequences, but nothing as drastic as what he proposes.
If vitamin deficiencies really had such an incredibly huge impact there would be a much stronger evolutionary pressure. With such a strong pressure, evolution might have developed vitamin storage organs or even a way for creatures to exchange vitamins to prevent vitamin deficiencies at any cost.
I didn't read through everything but if I understand this theory correctly it has a huge flaw:
Evolution would have countered it.
The proposed causes for diseases would be very easy for evolution to "cure". Therefore these diseases wouldn't exist if the theory was correct.
But wouldn't that actually support my approach? Assuming that there really is something important that all of humanity misses but the AI understands:
-If you hardcode the AI's optimal goal based on human deliberations you are guaranteed to miss this important thing.
-If you use the method I suggested, the AI will, driven by the desire to speak the truth, try to explain the problem to the humans who will in turn tell the AI what they think of that.
If it has "swallowed* that claim. You are assuming that the AI has a free choice about some goals >and is just programmed with others.
This is the important part.
the "optimal goal" is not actually controlling the AI.
the "optimal goal" is merely the subject of a discussion.
what is controlling the AI is the desire the tell the truth to the humans it is talking to, nothing more.
Why would that require more gullibility than "species X is more important than all the others"? >That doesn't even look like a moral claim.
The entire discussion is not supposed to unearth some kind of pure, inherently good, perfect optimal goal that transcends all reason and is true by virtue of existing or something.
The AI is supposed to take the human POV and think "if I were these humans, what would I want the AI's goal to be".
I didn't mention this explicitly because I didn't think it was necessary but the "optimal goal" is purely subjective from the POV of humanity and the AI is aware of this.
It would want to, because it's goal is defined as "tell the truth".
You have to differentiate between the goal we are trying to find (the optimal one) and the goal that is actually controlling what the AI does ("tell the truth"), while we are still looking for what that optimal goal could be.
the optimal goal is only implemented later, when we are sure that there are no bugs.
Hello Less Wrong.
I am 19 years old and have been interested in philosophy since I was 13. Today, I am interested in anything that has to do with intelligence, such as psychology and AI and rationality.
I believe in the possibility of the technological singularity and want to help make it happen.
I hope that the complex and unusual ways of thinking that I have taught myself over the last years while philosophizing will allow me to tackle this problem from directions other people have not yet thought of, just like they enabled me to manipulate my own psyche in limited ways, such as turning off unwanted emotions.
I am currently studying computer science in the first semester with the goal of specializing in AI later.
First you have to tell the machine to do that. It isn't trivial. The problem is not with the definition of "optimal" itself - >but with what function is being optimised.
If the AI understands psychology, it knows what motivates us. We won't need to explicitly explain any moral conundrums or point out dichotomies. It should be able to infer this knowledge from what it knows about the human psyche. Maybe it could just browse the internet for material on this topic to inform itself of how we humans work.
The way I see it, we humans will have as little need to tell the AI what we want as ants, if they could talk, would have a need to point out to a human that they don't want him to destroy their colony. Even the most abstract conundrums that philosophers needed centuries to even point out, much less answer, might seem obvious to the AI.
The above paragraph obviously only applies if the AI is already superhuman, but the general idea behind it works regardless of its intelligence.
Well not if you decide to train it for "a long time". History is foll of near-simultaneous inventions being made in >different places. Corporate history is full of close competition. There are anti-monopoly laws that attempt to >prevent dominance by any one party - usually by screwing with any company that gets too powerful.
OK, this might pose a problem. A possible solution: The AI, being supposed to turn into a benefactor for humanity as a whole, is developed in an international project instead of by a single company. This would ensure enough funding that it would be hard for a company to develop it faster, draw every AI developer to this one project, thus further eliminating competition, and reduce the chance that executive meddling causes people to get sloppy to save money.
No, none of this needs to be explicitly taught to it, that's what I'm trying to say.
The AI understands psychology, so just point it at the internet and tell it to inform itself. It might even read through this very comment of yours, think that these topics might be important for its task and decide to read about them, all on its own.
By ordering it to imagine what it would do in your position you implicitly order it to inform itself of all these things so that it can judge well.
If it fails to do so, the humans conversing with the AI will be able to point out a lot of things in the AI's suggestion that they wouldn't be comfortable with. This in turn will tell the AI that it should better inform itself of all these topics and consider them so that the humans will be more content with its next suggestion.
Legions of philosophers would disagree with you
They just bicker endlessly about uncertainty. "can you really know that 1+1=2?". No, but it can be used as valid until proven otherwise (which will never happen). As I said, the AI would need to understand the idea of uncertainty.
Maybe "Humans should die" is the truth. Maybe humans are bad for the planet. One of the problems >with FAI is that you don't want to give it objective morality because of that risk. You want it to side with >humans. Hence "friendly" AI rather than "righteous AI".
there is no such thing as objective morality. Good and evil are subjective ideas, nothing more. Firstly, unless someone explicitly tells the AI that it is a fundamental truth that nature is important to preserve, this can not happen. Secondly, the AI would also have to be incredibly gullible to just swallow such a claim. Thirdly, even if the AI does believe that, it will plainly say so to the people it is conversing with, in accordance with its goal to always tell the truth, thus warning us of this bug.
that would have to be a really sophisticated bug to misinterpret "always answer questions thruthfully as far as possible while admitting uncertainty" as "kill all humans". I'd imagine that something as drastic as that could be found and corrected long before that. Consider that you have its goal set to this. It knows no other motivation but to respond thruthfully. It doesn't care about the survival of humanity, or itself or about how reality really is. All it cares for is to answer the questions to the best of its abilities.
I don't think that this goal would be all too hard to define either, as "the truth" is a pretty simple concept. As long it deals with uncertainty in the right way (by admitting it), how could this be misinterpreted? Friendliness is far harder to define because we don't even know a definition for it ourselves. There are far too many things to consider when defining "friendliness".
It's a hard problem, but a riskless one.
Getting it to understand psychology may be difficult, but as it isn't a goal, just something it learns while growing, there is no risk in it. The risk comes from the goal. I'm trying to reduce the risk of coming up with a flawed goal for the AI by using its knowledge of psychology.
The whole point of what I'm trying to say is that I don't need to elaborate on the task definition. The AI is smarter than us and understands human psychology. If we don't define "optimal" properly it should be able to find a suitable definition on its own by imagining what we might have meant. If that turns out to be wrong, we can tell it and it comes up with an alternative.
I agree on the second point. It would be hard to define that goal properly, so it doesn't just shut itself down, but I don't think it would be impossible.
The idea that someone else would be able to build a superintelligence while you are teaching yours seems kind of far-fetched. I would assume that this takes a lot of effort and can only be done by huge corporations or states, anyway. If that is the case, there would be ample warning when one should finalize the AI and implement the goal before someone else becomes a threat by accidentally unleashing a paperclip maximizer.
Why would the AI be evil?
Intentions don't develop on their own. "Evil" intentions could only arise from misinterpreting existing goals.
While you are asking it to come up with a solution, you have its goal set to what I said in the original post:
"the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty"
Where would the evil intentions come from? At the moment you are asking the question, the only thing on the AI's mind is how it can answer truthfully.
The only loophole I can see is that it might realize it can reduce its own workload by killing everyone who is asking it questions, but that would be countered by the secondary goal "don't influence reality beyond answering questions".
Unless the programmers are unable to give the AI this extremely simple goal to just always speak the truth (as far as it knows), the AI won't have any hidden intentions.
And if the programmers working on the AI really are unable to implement this relatively simple goal, there is no hope that they would ever be able to implement the much more complex "optimal goal" they are trying to find out, anyway.
I have read the sequences (well, most of it). I can't find this as a standard proposal.
I think that I haven't made clear what I wanted to say so you just defaulted to "he has no idea what he is talking about" (which is reasonable).
What I meant to say is that rather than defining the "optimal goal" of the AI based on what we can come up with ourselves, the problem can be delegated to the AI itself as a psychological problem.
I assume that an AI would possess some knowledge of human psychology, as that would be necessary for pretty much every practical application, like talking to it.
What then prevents us from telling the AI the following:
"We humans would like to become immortal and live in utopia (or however you want to phrase it. If the AI is smart it will understand what you really mean through psychology). We disagree on the specifics and are afraid that something may go wrong. There are many contingencies to consider. Here is a list of contingencies we have come up with. Do you understand what we are trying to do? As you are much smarter than us, can you find anything that we have overlooked but that you expect us to agree with you on, once you point it out to us? Different humans have different opinions. This factors into this problem, too. Can you propose a general solution to this problem that remains flexible in the face of an unpredictable future (transhumas may have different ethics)?"
In essence, it all boils down to asking the AI:
"if you were in our position, if you had our human goals and drives, how would you define your (the AI's) goals?"
If you have an agent that is vastly more intelligent than you are and that understands how your human mind works, couldn't you just delegate the task of finding a good goal for it to the AI itself, just like you can give it any other kind of task?
That is quite a lot of knowledge this Martha is supposed to have. If a human, or whatever species this hypothetical being Martha is, had so much knowledge about its own inner workings, would it really still be surprised? That Martha feels like she learned something new is by no means a given fact. We postulate that it would be so, based on the fact that we would think we had learned something new if we were in the same situation. If Martha really knows all this on a quantitative level, who are we to assume that she would still feel like she learned something?
That assumption is based on something we all have in common (our inability to understand ourselves in detail), but that would not be shared by Martha.
There is no way for us to know if such an intelligent and introspective being would actually learn something in this situation. This makes the question pointless if we assume that Martha is omnicient regarding her own psyche. The entire line of reasoning would depend on something which we can not actually know.
For this reason I was working under the assumption that Martha was merely extremely smart, compared to human scientists, but not able to analyze herself in ways we don't even begin to understand the implications of.
The color red has many associations to the subconscious. These associations lie dormant and unused until the first time she actually experiences the color red. No amount of study could enable her to understand every nuance of these associations to the color red. I think that she really would learn something upon first seeing the color red.
Those things would probably not be particularly important because if they were someone would have taken the time to write them down at some point and Martha would have read about them. Things like "The color red is associated with aggression" are easy enough to learn but they are only qualitative, not quantitative. Until she actually experiences the color she will have no idea exactly how strong those associations are.
Perceiving a color has many very subtle effects on the mind that are not easy to identify even while you are experiencing them. When Martha first perceives the color red she experiences these connections for the first time. I think this would suffice to explain that Martha feels like she has learned something, because she actually has learned something, it's just too subtle to point out exactly what it is.
That said, I think you are right in principle in that different parts of the mind/ mental agents/ memes can work against each other by mistake or be harmful to the mind as a whole because they don't realize that they don't apply to a given situation. I just think that this particular scenario has a different explanation.
A technique that I have been using for several years to great effect is the following:
Whenever I think my decision making is affected negatively by an emotion I go through these steps:
- Identify the exact nature of the emotion.
- From an evolutionary point of view, what was the emotion's original purpose?
- What has changed since then that makes the emotion no longer useful today?
- Internalize this and "convince" the emotion to stop.
I basically try to "talk" to my subconscious and convince it to stop. I don't try to fight my subconscious or get it to accept reality but just mentally repeat those findings to myself until the irrational impulses of my subconscious are drowned out by the more rational response I designed.
I basically tell my subconscious that if it wants to help, it should just stop interfering with things that it is incapable of understanding.
Using this technique I have virtually eliminated all grief, resentment and desperation. I won't try to eliminate pain as this can actually be quite useful. I have also used it to turn hatred into spite, as the later has less of a destructive effect (it is more passive and far less likely to result in an outburst).
I don't know the reason why it works so well for me, but I could imagine that it is because I treat my subconsciousness's irrational impulses not as obstacles to overcome but as a machine that is outdated and broken.
Essentially, instead of telling my subconsciousness to "shut up!", I tell it to "stop helping me!"