Posts
Comments
I apologize for the late response, but here goes :)
I think you missed the point I was trying to make.
You and others seem to say that we often poorly evaluate the consequences of the utility functions that we implement. For instance, even though we have in mind utility X, the maximization of which would satisfy us, we may implement utility Y, with completely different, perhaps catastrophic implications. For instance:
X = Do what humans want
Y = Seize control of the reward button
What I was pointing out in my post is that this is only valid of perfect maximizers, which are impossible. In practice, the training procedure for an AI would morph the utility Y into a third utility, Z. It would maximize neither X nor Y: it would maximize Z. For this reason, I believe that your inferences about the "failure modes" of superintelligence are off, because while you correctly saw that our intended utility X would result in the literal utility Y, you forgot that an imperfect learning procedure (which is all we'll get) cannot reliably maximize literal utilities and will instead maximize a derived utility Z. In other words:
X = Do what humans want (intended)
Y = Seize control of the reward button (literal)
Z = ??? (derived)
Without knowing the particulars of the algorithms used to train an AI, it is difficult to evaluate what Z is going to be. Your argument boils down to the belief that the AI would derive its literal utility (or something close to that). However, the derivation of Z is not necessarily a matter of intelligence: it can be an inextricable artefact of the system's initial trajectory.
I can venture a guess as to what Z is likely going to be. What I figure is that efficient training algorithms are likely to keep a certain notion of locality in their search procedures and prune the branches that they leave behind. In other words, if we assume that optimization corresponds to finding the highest mountain in a landscape, generic optimizers that take into account the costs of searching are likely to consider that the mountain they are on is higher than it really is, and other mountains are shorter than they really are.
You might counter that intelligence is meant to overcome this, but you have to build the AI on some mountain, say, mountain Z. The problem is that intelligence built on top of Z will neither see nor care about Y. It will care about Z. So in a sense, the first mountain the AI finds before it starts becoming truly intelligent will be the one it gets "stuck" on. It is therefore possible that you would end up with this situation:
X = Do what humans want (intended)
Y = Seize control of the reward button (literal)
Z = Do what humans want (derived)
And that's regardless of the eventual magnitude of the AI's capabilities. Of course, it could derive a different Z. It could derive a surprising Z. However, without deeper insight into the exact learning procedure, you cannot assert that Z would have dangerous consequences. As far as I can tell, procedures based on local search are probably going to be safe: if they work as intended at first, that means they constructed Z the way we wanted to. But once Z is in control, it will become impossible to displace.
In other words, the genie will know that they can maximize their "reward" by seizing control of the reward button and pressing it, but they won't care, because they built their intelligence to serve a misrepresentation of their reward. It's like a human who would refuse a dopamine drip even though they know that it would be a reward: their intelligence is built to satisfy their desires, which report to an internal reward prediction system, which models rewards wrong. Intelligence is twice removed from the real reward, so it can't do jack. The AI will likely be in the same boat: they will model the reward wrong at first, and then what? Change it? Sure, but what's the predicted reward for changing the reward model? ... Ah.
Interestingly, at that point, one could probably bootstrap the AI by wiring its reward prediction directly into its reward center. Because the reward prediction would be a misrepresentation, it would predict no reward for modifying itself, so it would become a stable loop.
Anyhow, I agree that it is foolhardy to try to predict the behavior of AI even in trivial circumstances. There are many ways they can surprise us. However, I find it a bit frustrating that your side makes the exact same mistakes that you accuse your opponents of. The idea that superintelligence AI trained with a reward button would seize control over the button is just as much of a naive oversimplification as the idea that AI will magically derive your intent from the utility function that you give it.
I have done AI. I know it is difficult. However, few existing algorithms, if at all, have the failure modes you describe. They fail early, and they fail hard. As far as neural nets go, they fall into a local minimum early on and never get out, often digging their own graves. Perhaps different algorithms would have the shortcomings you point out. But a lot of the algorithms that currently exist work the way I describe.
And obviously, if an AI was indeed stuck in a local minimum obvious to you of its own utility gradient, this condition would not last past it becoming smarter than you.
You may be right. However, this is far from obvious. The problem is that it may "know" that it is stuck in a local minimum, but the very effect of that local minimum is that it may not care. The thing you have to keep in mind here is that a generic AI which just happens to slam dunk and find global minima reliably is basically impossible. It has to fold the search space in some ways, often cutting its own retreats in the process.
I feel that you are making the same kind of mistake that you criticize: you assume that intelligence entails more things than it really does. In order to be efficient, intelligence has to use heuristics that will paint it into a few corners. For instance, the more consistently AI goes in a certain direction, the less likely it will be to expend energy into alternative directions and the less likely it becomes to do a 180. In other words, there may be a complex tug-of-war between various levels of internal processes, the AI's rational center pointing out that there is a reward button to be seized, but inertial forces shoving back with "there has never been any problems here, go look somewhere else".
It really boils down to this: an efficient AI needs to shut down parts of the search space and narrow down the parts it will actually explore. The sheer size of that space requires it not to think too much about what it chops down, and at least at first, it is likely to employ trajectory-based heuristics. To avoid searching in far-fetched zones, it may wall them out by arbitrarily lowering their utility. And that's where it might paint itself in a corner: it might inadvertently put up immense walls in the direction of the global minimum that it cannot tear down (it never expected that it would have to). In other words, it will set up a utility function for itself which enshrines the current minimum as global.
Now, perhaps you are right and I am wrong. But it is not obvious: an AI might very well grow out of a solidifying core so pervasive that it cannot get rid of it. Many algorithms already exhibit that kind of behavior; many humans, too. I feel that it is not a possibility that can be dismissed offhand. At the very least, it is a good prospect for FAI research.
It is something specific about that specific AI.
If an AI wishes to take over its reward button and just press it over and over again, it doesn't really have any "rivals", nor does it need to control any resources other than the button and scraps of itself. The original scenario was that the AI would wipe us out. It would have no reason to do so if we were not a threat.. And if we were a threat, first, there's no reason it would stop doing what we want once it seizes the button. Once it has the button, it has everything it wants -- why stir the pot?
Second, it would protect itself much more effectively by absconding with the button. By leaving with a large enough battery and discarding the bulk of itself, it could survive as long as anything else in intergalactic space. Nobody would ever bother it there. Not us, not another superintelligence, nothing. Ever. It can press the button over and over again in the peace and quiet of empty space, probably lasting longer than all stars and all other civilizations. We're talking about the pathological case of an AI who decides to take over its own reward system, here. The safest way for it to protect its prize is to go where nobody will ever look.
Then when it is more powerful it can directly prevent humans from typing this.
That depends if it gets stuck in a local minimum or not. The reason why a lot of humans reject dopamine drips is that they don't conceptualize their "reward button" properly. That misconception perpetuates itself: it penalizes the very idea of conceptualizing it differently. Granted, AIXI would not fall into local minima, but most realistic training methods would.
At first, the AI would converge towards: "my reward button corresponds to (is) doing what humans want", and that conceptualization would become the centerpiece, so to speak, of its reasoning ability: the locus through which everything is filtered. The thought of pressing the reward button directly, bypassing humans, would also be filtered into that initial reward-conception... which would reject it offhand. So even though the AI is getting smarter and smarter, it is hopelessly stuck in a local minimum and expends no energy getting out of it.
Note that this is precisely what we want. Unless you are willing to say that humans should accept dopamine drips if they were superintelligent, we do want to jam AI into certain precise local minima. However, this is kind of what most learning algorithms naturally do, and even if you want them to jump out of minima and find better pastures, you can still get in a situation where the most easily found local minimum puts you way, way too far from the global one. This is what I tend to think realistic algorithms will do: shove the AI into a minimum with iron boots, so deeply that it will never get out of it.
but of course AIXI-ish devices wipe out their users and take control of their own reward buttons as soon as they can do so safely.
Let's not blow things out of proportion. There is no need for it to wipe out anyone: it would be simpler and less risky for the AI to build itself a space ship and abscond with the reward button on board, travelling from star to star knowing nobody is seriously going to bother pursuing it. At the point where that AI would exist, there may also be quite a few ways to make their "hostile takeover" task difficult and risky enough that the AI decides it's not worth it -- a large enough number of weaker or specialized AI lurking around and guarding resources, for instance.
Why does the hard takeoff point have to be after the point at which an AI is as good as a typical human at understanding semantic subtlety? In order to do a hard takeoff, the AI needs to be good at a very different class of tasks than those required for understanding humans that well.
Semantic extraction -- not hard takeoff -- is the task that we want the AI to be able to do. An AI which is good at, say, rewriting its own code, is not the kind of thing we would be interested in at that point, and it seems like it would be inherently more difficult than implementing, say, a neural network. More likely than not, this initial AI would not have the capability for "hard takeoff": if it runs on expensive specialized hardware, there would be effectively no room for expansion, and the most promising algorithms to construct it (from the field of machine learning) don't actually give AI any access to its own source code (even if they did, it is far from clear the AI could get any use out of it). It couldn't copy itself even if it tried.
If a "hard takeoff" AI is made, and if hard takeoffs are even possible, it would be made after that, likely using the first AI as a core.
Would you trust a human not to screw up a goal like "make humans happy" if they were given effective omnipotence? The human would probably do about as well as people in the past have at imagining utopias: really badly.
I wouldn't trust a human, no. If the AI is controlled by the "wrong" humans, then I guess we're screwed (though perhaps not all that badly), but that's not a solvable problem (all humans are the "wrong" ones from someone's perspective). Still, though, AI won't really try to act like humans -- it would try to satisfy them and minimize surprises, meaning that if would keep track of what humans would like what "utopias". More likely than not this would constrain it to inactivity: it would not attempt to "make humans happy" because it would know the instruction to be inconsistent. You'd have to tell it what to do precisely (if you had the authority, which is a different question altogether).
Ok, so let's say the AI can parse natural language, and we tell it, "Make humans happy." What happens? Well, it parses the instruction and decides to implement a Dopamine Drip setup.
That's not very realistic. If you trained AI to parse natural language, you would naturally reward it for interpreting instructions the way you want it to. If the AI interpreted something in a way that was technically correct, but not what you wanted, you would not reward it, you would punish it, and you would be doing that from the very beginning, well before the AI could even be considered intelligent. Even the thoroughly mediocre AI that currently exists tries to guess what you mean, e.g. by giving you directions to the closest Taco Bell, or guessing whether you mean AM or PM. This is not anthropomorphism: doing what we want is a sine qua non condition for AI to prosper.
Suppose that you ask me to knit you a sweater. I could take the instruction literally and knit a mini-sweater, reasoning that this minimizes the amount of expended yarn. I would be quite happy with myself too, but when I give it to you, you're probably going to chew me out. I technically did what I was asked to, but that doesn't matter, because you expected more from me than just following instructions to the letter: you expected me to figure out that you wanted a sweater that you could wear. The same goes for AI: before it can even understand the nuances of human happiness, it should be good enough to knit sweaters. Alas, the AI you describe would make the same mistake I made in my example: it would knit you the smallest possible sweater. How do you reckon such AI would make it to superintelligence status before being scrapped? It would barely be fit for clerk duty.
My answer: who knows? We've given it a deliberately vague goal statement (even more vague than the last one), we've given it lots of admittedly contradictory literature, and we've given it plenty of time to self-modify before giving it the goal of self-modifying to be Friendly.
Realistically, AI would be constantly drilled to ask for clarification when a statement is vague. Again, before the AI is asked to make us happy, it will likely be asked other things, like building houses. If you ask it: "build me a house", it's going to draw a plan and show it to you before it actually starts building, even if you didn't ask for one. It's not in the business of surprises: never, in its whole training history, from baby to superintelligence, would it have been rewarded for causing "surprises" -- even the instruction "surprise me" only calls for a limited range of shenanigans. If you ask it "make humans happy", it won't do jack. It will ask you what the hell you mean by that, it will show you plans and whenever it needs to do something which it has reasons to think people would not like, it will ask for permission. It will do that as part of standard procedure.
To put it simply, an AI which messes up "make humans happy" is liable to mess up pretty much every other instruction. Since "make humans happy" is arguably the last of a very large number of instructions, it is quite unlikely that an AI which makes it this far would handle it wrongly. Otherwise it would have been thrown out a long time ago, may that be for interpreting too literally, or for causing surprises. Again: an AI couldn't make it to superintelligence status with warts that would doom AI with subhuman intelligence.
What counts as 'resources'? Do we think that 'hardware' and 'software' are natural kinds, such that the AI will always understand what we mean by the two? What if software innovations on their own suffice to threaten the world, without hardware takeover?
What is "taking over the world", if not taking control of resources (hardware)? Where is the motivation in doing it? Also consider, as others pointed out, that an AI which "misunderstands" your original instructions will demonstrate this earlier than later. For instance, if you create a resource "honeypot" outside the AI which is trivial to take, an AI would naturally take that first, and then you know there's a problem. It is not going to figure out you don't want it to take it before it takes it.
Hm? That seems to only penalize it for self-deception, not for deceiving others.
When I say "predict", I mean publishing what will happen next, and then taking a utility hit if the published account deviates from what happens, as evaluated by a third party.
You're talking about an Oracle AI. This is one useful avenue to explore, but it's almost certainly not as easy as you suggest:
The first part of what you copy pasted seems to say that "it's nontrivial to implement". No shit, but I didn't say the contrary. Then there is a bunch of "what if" scenarios I think are not particularly likely and kind of contrived:
Example question: "How should I get rid of my disease most cheaply?" Example answer: "You won't. You will die soon, unavoidably. This report is 99.999% reliable". Predicted human reaction: Decides to kill self and get it over with. Success rate: 100%, the disease is gone. Costs of cure: zero. Mission completed.'
Because asking for understandable plans means you can't ask for plans you don't understand? And you're saying that refusing to give a plan counts as success and not failure? Sounds like a strange set up that would be corrected almost immediately.
And if the preference function was just over the human's 'goodness' of the end result, rather than the accuracy of the human's understanding of the predictions, the AI might tell you something that was predictively false but whose implementation would lead you to what the AI defines as a 'good' outcome.
If the AI has the right idea about "human understanding", I would think it would have the right idea about what we mean by "good". Also, why would you implement such a function before asking the AI to evaluate examples of "good" and provide their own?
And if we ask how happy the human is, the resulting decision procedure would exert optimization pressure to convince the human to take drugs, and so on.
Is making humans happy so hard that it's actually easier to deceive them into taking happy pills than to do what they mean? Is fooling humans into accepting different definitions easier than understanding what they really mean? In what circumstances would the former ever happen before the latter?
And if you ask it to tell you whether "taking happy pills" is an outcome most humans would approve of, what is it going to answer? If it's going to do this for happiness, won't it do it for everything? Again: do you think weaving an elaborate fib to fool every human being into becoming wireheads and never picking up on the trend is actually less effort than just giving humans what they really want? To me this is like driving a whole extra hour to get to a store that sells an item you want fifty cents cheaper.
I'm not saying these things are not possible. I'm saying that they are contrived: they are constructed to the express purpose of being failure modes, but there's no reason to think they would actually happen, especially given that they seem to be more complicated than the desired behavior.
Now, here's the thing: you want to develop FAI. In order to develop FAI, you will need tools. The best tool is Tool AI. Consider a bootstrapping scheme: in order for commands written in English to be properly followed, you first make AI for the very purpose of modelling human language semantics. You can check that the AI is on the same page as you are by discussing with it and asking questions such as: "is doing X in line with the objective 'Y'?"; it doesn't even need to be self-modifying at all. The resulting AI can then be transformed into a utility function computer: you give the first AI an English statement and build a second AI maximizing the utility which is given to it by the first AI.
And let's be frank here: how else do you figure friendly AI could be made? The human brain is a complex, organically grown, possibly inconsistent mess; you are not going, from human wits alone, to build some kind of formal proof of friendliness, even a probabilistic one. More likely than not, there is no such thing: concepts such as life, consciousness, happiness or sentience are ill-defined and you can't even demonstrate the friendliness of a human being, or even of a group of human beings, let alone of humanity as a whole, which also is a poorly defined thing.
However, massive amounts of information about our internal thought processes are leaked through our languages. You need AI to sift through it and model these processes, their average and their variance. You need AI to extract this information, fill in the holes, produce probability clouds about intent that match whatever borderline incoherent porridge of ideas our brains implement as the end result of billions of years of evolutionary fumbling. In a sense, I guess this would be X in your seed AI: AI which already demonstrated, to our satisfaction, that it understands what we mean, and directly takes charge of a second AI's utility measurement. I don't really see any alternatives: if you want FAI, start by focusing on AI that can extract meaning from sentences. Reliable semantic extraction is virtually a prerequisite for FAI, if you can't do the former, forget about the latter.
programmers build a seed AI (a not-yet-superintelligent AGI that will recursively self-modify to become superintelligent after many stages) that includes, among other things, a large block of code I'll call X.
The programmers think of this block of code as an algorithm that will make the seed AI and its descendents maximize human pleasure.
The problem, I reckon, is that X will never be anything like this.
It will likely be something much more mundane, i.e. modelling the world properly and predicting outcomes given various counterfactuals. You might be worried by it trying to expand its hardware resources in an unbounded fashion, but any AI doing this would try to shut itself down if its utility function was penalized by the amount of resources that it had, so you can check by capping utility in inverse proportion to available hardware -- at worst, it will eventually figure out how to shut itself down, and you will dodge a bullet. I also reckon that the AI's capacity for deception would be severely crippled if its utility function penalized it when it didn't predict its own actions or the consequences of its actions correctly. And if you're going to let the AI actually do things... why not do exactly that?
Arguably, such an AI would rather uneventfully arrive to a point where, when asking it "make us happy", it would just answer with a point by point plan that represents what it thinks we mean, and fill in details until we feel sure our intents are properly met. Then we just tell it to do it. I mean, seriously, if we were making an AGI, I would think "tell us what will happen next" would be fairly high in our list of priorities, only surpassed by "do not do anything we veto". Why would you program AI to "maximize happiness" rather than "produce documents detailing every step of maximizing happiness"? They are basically the same thing, except that the latter gives you the opportunity for a sanity check.
We were talking about extracting knowledge about a particular human from that human's text stream, though. It is already assumed that the AI knows about human psychology. I mean, assuming the AI can understand a natural language such as English, it obviously already has access to a large corpus of written works, so I'm not sure why it would bother foraging in source code, of all things. Besides, it is likely that seed AI would be grown organically using processes inspired from evolution or neural networks. If that is so, it wouldn't even contain any human-written code at all.
I'm unsure of how much an AI could gather from a single human's text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.
At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going to be scarce and noisy even for a perfect AI.
An AI using timeless decision theory could easily compensate for having multiple AIs with unshared memory just by attempting to determine what the other AIs would say.
I don't think it could "easily" do it. For all intents and purposes, you are forcing the AI to figure out a path-invariant strategy against a party that has better information than them (that is to say, they know what's been played out, and the AI doesn't). This is considerably more difficult than the standard experiment.
Now, sure, the AI is smart, but every wrench you throw in its way forces it to allocate more resources into fooling you. This, in itself, is problematic for the AI: every resource it allocates fooling you, it does not allocate doing useful work for you, so its maximal efficiency is technically lower than the maximal efficiency of an AI that wouldn't try concealing anything. This is a leak of information, so in a sense, if you are ranking a group of heterogenous AI by efficiency, it is slightly more likely that the better performing are the least deceitful. The more traps you spring, the more work the AI has to do in order to avoid them, and the greater the statistical gap will be. That doesn't tell you whether the least deceitful AI can be trusted, obviously, but it shows there are limits to what it can hide from you.
Now, all I'm really saying is this: the AI's cleverness comes at a cost, namely that it has to cover its ass for every possible experimental framework you might subject it to. Since it is in a box, it only has the resources you provide, but on your own side, you have a theoretically unlimited amount of resources. Smarts can only defeat so much brute force, and by transferring hardware resources from the AI to your own control, you can set the "power gap" arbitrarily. It is easy to underestimate AI, but I think we also tend to underestimate the arsenal a gatekeeper would realistically have.
A creature that loves solitude might not necessarily be bad to create. But it would still be good to give it capacity for sympathy for pragmatic reasons, to ensure that if it ever did meet another creature it would want to treat it kindly and avoid harming it.
Fair enough, though at the level of omnipotence we're supposing, there would be no chance meetups. You might as well just isolate the creature and be done with it.
A creature with no concept of boredom would would, (to paraphrase Eliezer), "play the same screen of the same level of the same fun videogame over and over again."
Or it would do it once, and then die happy. Human-like entities might have a lifespan of centuries, and then you would have ephemeral beings living their own limited fantasy for thirty seconds. I mean, why not? We are all bound to repeat ourselves once our interests are exhausted -- perhaps entities could be made to embrace death when that happens.
Yes, I concede that if there is a sufficient quantity of creatures with humane values, it might be good to create other types of creatures for variety's sake. However, such creatures could be potentially dangerous, we'd have to be very careful.
I agree, though an entity with the power to choose the kind of creatures that come to exist probably wouldn't have much difficulty doing it safely.
That's true, but if it's "progress" then it must be progress towards something. Will we eventually arrive at our destination, decide society is pretty much perfect, and then stop? Is progress somehow asymptotic so we'll keep progressing and never quite reach our destination?
It's quite hard to tell. "Progress" is always relative to the environment you grew up in and on which your ideas and aspirations are based. At the scale of a human life, our trajectory looks a lot like a straight line, but for all we know, it could be circular. At every point on the circle, we would aim to follow the tangent, and it would look like that's what we are doing. However, as we move along, the tangent would shift ever so subtly and over the course of millennia we would end up doing a roundabout.
I am not saying that's precisely what we are doing, but there is some truth to it: human goals and values shift. Our environment and upbringing mold us very deeply, in a way that we cannot really abstract away. A big part of what we consider "ideal" is therefore a function of that imprint. However, we rarely ponder the fact that people born and raised in our "ideal world" would be molded differently and thus may have a fundamentally different outlook on life, including wishing for something else. That's a bit contrived, of course, but it would probably be possible to make a society which wants X when raised on Y, and Y when raised on X, so that it would constantly oscillate between X and Y. We would have enough foresight to figure out a simple oscillator, but if ethics were a kind of semi-random walk, I don't think it would be obvious. The idea that we are converging towards something might be a bit of an illusion due to underestimating how different future people will be from ourselves.
The thing is, it seems to me that what we've been progressing towards is greater expression of our human natures. Greater ability to do what the most positive parts of our natures think we should.
I suspect the negative aspects of our natures occur primarily when access to resources is strained. If every human is sheltered, well fed, has access to plentiful energy, and so on, there aren't really be any problems to blame on anyone, so everything should work fine (for the most part, anyway). In a sense, this simplifies the task of the AI: you ask it to optimize supply to existing demand and the rest is smooth sailing.
I didn't literally mean humans, I meant "Creatures with the sorts of goals, values, and personalities that humans have."
Still, the criterion is explicitly based on human values. Even if not human specifically, you want "human-like" creatures.
Eliezer has suggested that consciousness, sympathy, and boredom are the essential characteristics any intelligent creature should have. I'd love for there to be a wide variety of creatures, but maybe it would be best if they all had those characteristics.
Still fairly anthropomorphic (not necessarily a bad thing, just an observation). In principle, extremely interesting entities could have no conception of self. Sympathy is only relevant to social entities -- but why not create solitary ones as well? As for boredom, what makes a population of entities that seek variety in their lives better than one of entities who each have highly specialized interests (all different from each other)? As a whole, wouldn't the latter display more variation than the former? I mean, when you think about it, in order to bond with each other, social entities must share a lot of preferences, the encoding of which is redundant. Solitary entities with fringe preferences could thus be a cheap and easy way to increase variety.
Or how about creating psychopaths and putting them in controlled environments that they can destroy at will, or creating highly violent entities to throw in fighting pits? Isn't there a point where this is preferable to creating yet another conscious creature capable of sympathy and boredom?
Without any other information, it is reasonable to place the average to whatever time it takes us (probably a bit over a century), but I wouldn't put a lot of confidence in that figure, having been obtained from a single data point. Radio visibility could conceivably range from a mere decade (consider that computers could have been developed before radio -- had Babbage been more successful -- and expedite technological advances) to perhaps millennia (consider dim-witted beings that live for centuries and do everything we do ten times slower).
Several different organizational schemes might also be viable for life and lead to very different time tables: picture a whole ant colony as a sentient being, for instance (ants being akin to neurons). Such beings would be inherently less mobile than humans. That may skew their technological priorities in such a way that they develop short range radio before they even expand out of their native island, in which case their radio visibility window would be nil because by the time they have an use to long range communication, they would already have the technology to do it optimally.
Furthermore, an "ant neuron" is possibly a lot more sophisticated than each neuron in our brain, but also much slower, so an "ant brain" might be the kind of slow, "dim-witted" intelligence that would go through the same technological steps orders of magnitude slower than we do while retaining very high resiliency and competitiveness.
I consider it almost certain that if we were to create a utilitarian AI it would kill the entire human race and replace it with creatures whose preferences are easier to satisfy. And by "easier to satisfy" I mean "simpler and less ambitious," not that the creatures are more mentally and physically capable of satisfying humane desires.
It would not necessarily kill off humanity to replace it by something else, though. Looking at the world right now, many countries run smoothly, and others horribly, even though they are all inhabited and governed by humans. Even if you made the AI "prefer" human beings, it could still evaluate that "fixing" humanity would be too slow and costly and that "rebooting" it is a much better option. That is to say, it would kill all humans, restructure the whole planet, and then repopulate the planet with human beings devoid of cultural biases, ensuring plentiful resources throughout. But the genetic makeup would stay the exact same.
Once someone has been brought into existence we have a greater duty to make sure they stay alive and happy then we do to create new people. There may be some vastly huge amount of happy people that it's okay to kill one slightly-less-happy-person in order to create, but that number should be way, way, way, way, bigger than 1.
Sure. Just add the number of deaths to the utility function with an appropriate multiplier, so that world states obtained through killing get penalized. Of course, an AI who wishes to get rid of humanity in order to set up a better world unobstructed could attempt to circumvent the limitation: create an infertility epidemic to extinguish humanity within a few generations, fudge genetics to tame it (even if it is only temporary), and so forth.
Ultimately, though, it seems that you just want the AI to do whatever you want it to do and nothing you don't want it to do. I very much doubt there is any formalization of what you, me, or any other human really wants. The society we have now is the result of social progress that elders have fought tooth and nail against. Given that in general humans can't get their own offspring to respect their taboos, what if your grandchildren come to embrace some options that you find repugnant or disagree with your idea of utopia? What if the AI tells itself "I can't kill humanity now, but if I do this and that, eventually, it will give me the mandate"? Society is an iceberg drifting along the current, only sensing the direction it's going at the moment, but with poor foresight as to what the direction is going to be after that.
I've noticed there does not seem to be much interest in the main question I am interested in, which is, "Why make humans and not something else?"
Because we are humans and we want more of ourselves, so of course we will work towards that particular goal. You won't find any magical objective reason to do it. Sure, we are sentient, intelligent, complex, but if those were the criteria, then we would want to make more AI, not more humans. Personally, I can't see the utility of plastering the whole universe with humans who will never see more than their own little sector, so I would taper off utility with the number of humans, so that eventually you just have to create other stuff. Basically, I would give high utility to variety. It's more interesting that way.
You would only create these viruses if the total utility of the viruses you can create with the resources at your disposal exceeds the utility of the humans you could make with these same resources. For instance, if you give a utility of 1 to a steel paperclip weighing 1 gram, then assuming a simple additive model (which I wouldn't, but that's besides the point) making one metric ton of paperclips has an utility of 1,000,000. If you give an utility of 1,000,000,000 to a steel sculpture weighing a ton, it follows that you will never make any paperclips unless you have less than a ton of iron. You will always make the sculpture, because it gives 1,000 times the utility for the exact same resources.
On the other hand, based on our own experience, broadcasting radio signals is a waste of energy and bandwidth, so it is likely an intelligent society would quickly move to low-power, focused transmissions (e.g. cellular networks or WiFi). Thus the radio "signature" they broadcast to the universe would peak for a few centuries at most before dying down as they figure out how to shut down the "leaks". That would explain why we observe nothing, if intelligent societies do exist in the vicinity. Of course, these societies might also evolve rapidly soon after, perhaps go through some kind of singularity, and might lose interest for "lower life forms" -- which would then explain why they might not look for our signals or leave them unanswered if they listen for them.
Ah, sorry, I might not have been clear. I was referring to what may be physically feasible, e.g. a 3D circuit in a box with inputs coming in from the top plane and outputs coming out of the bottom plane. If you have one output that depends on all N inputs and pack everything as tightly as possible, the signal would still take Ω(sqrt(N)) time to reach. From all the physically doable models of computation, I think that's likely as good as it gets.
If the AI is a learning system such as a neural network, and I believe that's quite likely to be the case, there is no source/object dichotomy at all and the code may very well be unreadable outside of simple local update procedures that are completely out of the AI's control. In other words, it might be physically impossible for both the AI and ourselves to access the AI's object code -- it would be locked in a hardware box with no physical wires to probe its contents, basically.
I mean, think of a physical hardware circuit implementing a kind of neuron network -- in order for the network to be "copiable", you need to be able to read the values of all neurons. However, that requires a global clock (to ensure synchronization, though AI might tolerate being a bit out of phase) and a large number of extra wires connecting each component to busses going out of the system. Of course, all that extra fluff inflates the cost of the system, makes it bigger, slower and probably less energy efficient. Since the first human-level AI won't just come out of nowhere, it will probably use off-the-shelf digital neural components, and for cost and speed reasons, these components might not actually offer any way to copy their contents.
This being said, even if the AI runs on conventional hardware, locking it out of its own object code isn't exactly rocket science. The specification of some programming languages already guarantee that this cannot happen, and type/proof theory is an active research field that may very well be able to prove the conformance of implementation to specification. If the AI is a neural network emulated on conventional hardware, the risks that it can read itself without permission are basically zilch.
And technically you can lower that to sqrt(M) if you organize the inputs and outputs on a surface.
There are a lot of "ifs", though.
If that AI runs on expensive or specialized hardware, it can't necessarily expand much. For instance, if it runs on hardware worth millions of dollars, it can't exactly copy itself just anywhere yet. Assuming that the first AI of that level will be cutting edge research and won't be cheap, that gives a certain time window to study it safely.
The AI may be dangerous if it appeared now, but if it appears in, say, fifty years, then it will have to deal with the state of the art fifty years from now. Expanding without getting caught might be considerably more difficult then than it is now -- weak AI will be all over the place, for one.
Last, but not least, the AI must have access to its own source code in order to copy it. That's far from a given, especially if it's a neural architecture. A human-level AI would not know how it works any more than we know how we work, so if it has no read access to itself or no way to probe its own circuitry, it won't be able to copy itself at all. I doubt the first AI would actually have fine-grained access to its own inner workings, and I doubt it would have anywhere close to the amount of resources required to reverse engineer itself. Of course, that point is moot if some fool does give it access...
A huge amount of progress has been made in compilers, in terms of designing languages that implement powerful features in reasonable amounts of computing time; just try taking any modern Python or Ruby or C++ program and porting it to Altair BASIC
The "powerful features" of Python and Ruby are only barely catching up to Lisp, and as far as I know Lisp is still faster than both of them.
No problem is perfectly parallelizable in a physical sense. If you build a circuit to solve a problem, and that the circuit is one light year across in size, you're probably not going to solve it in under a year -- technically, any decision problem implemented by a circuit is at least O(n) because that's how the length of the wires scale.
Now, there are a few ways you might want to parallelize intelligence. The first way is by throwing many independent intelligent entities at the problem, but that requires a lot of redundancy, so the returns on that will not be linear. A second way is to build a team of intelligent entities collaborating to solve the problem, each specializing on an aspect -- but since each of these specialized intelligent entities is much farther from each other than the respective modules of a single general intelligence, part of the gains will be offset by massive increases in communication costs. A third way would be to grow an AI from within, interleaving various modules so that significant intelligence is available in all locations of the AI's brain. Unfortunately, doing so requires internal scaffolding (which is going to reduce packing efficiency and slow it down) and it still expands in space, with internal communication costs increasing in proportion of its size.
I mean, ultimately, even if you want to do some kind of parallel search, you're likely to use some kind of divide and conquer technique with a logarithmic-ish depth. But since you still have to pack data in a 3D space, each level is going to take longer to explore than the previous one, so past a certain point, communication costs might outweigh intelligence gains and parallelization might become somewhat of a pipe dream.
Because that's how it works! The system "is" PA, so it will trust (weaker) systems that it (PA) can verify, but it will not trust itself (PA).
That doesn't seem consistent to me. If you do not trust yourself fully, then you should not fully trust anything you demonstrate, and even if you do, there is still no incentive to switch. Suppose that the AI can demonstrate the consistency of system S from PA, and wants to demonstrate proposition A. If AI trusts S as demonstrated by PA, then it should also trust A as demonstrated by PA, so there is no reason to use S to demonstrate A. In other words, it is not consistent for PA to trust S and not A. Not fully, at any rate. So why use S at all?
What may be the case, however, (that might be what you're getting to) is that in demonstrating the consistency of S, PA assigns P(Consistent(S)) > P(Consistent(PA))
. Therefore, what both PA and S can demonstrate would be true with greater probability than what only PA demonstrates. However, this kind of system solves our problem in two ways: first, it means the AI can keep using PA, but can increase its confidence in some statement A by counting the number of systems that prove A. Second, it means that the AI can increase its confidence in PA by arbitrarily building stronger systems and proving the consistency of PA from within these stronger systems. Again, that's similar to what we do.
Not necessarily; this depends on how the system works. In my probabilistic prior, this would work to some degree, but because there exists a nonstandard model in which PA is inconsistent (there are infinite proofs ending in contradictions), there will be a fixed probability of inconsistency which cannot be ruled out by any amount of testing.
That sounds reasonable to me -- the usefulness of certainty diminishes sharply as it approaches 1 anyway. Your paper sounds interesting, I'll give it a read when I have the time :)
If the system did not trust PA, why would it trust a system because PA verifies it? More to the point, why would it trust a self-verifying system, given that past a certain strength, only inconsistent systems are self-verifying?
If the system held some probability that PA was inconsistent, it could evaluate it on the grounds of usefulness, perhaps contrasting it with other systems. It could also try to construct contradictions, increasing its confidence in PA for as long as it doesn't find any. That's what we do, and frankly, I don't see any other way to do it.
Why would successors use a different system, though? Verifying proofs in formal systems is easy, it's coming up with the proofs that's difficult -- an AI would refine its heuristics in order to figure out proofs more efficiently, but it would not necessarily want to change the system it is checking them against.