Posts
Comments
Would uploads avoid self improvement? If we are going to try to address this question, we should first consider the plausibility and importance of the whole upload concept.
Given the power and relatively young age of some Silicon Valley executives who seem to see uploading as part of their future, we might want to check to see whether the pursuit of uploading would have any side-effects.
If we believe that uploads are malleable and improvable, then the technology to create uploads would also permit the creation of more powerful minds, with all the consequences.
Uploads and those creating a WBE-like entity as progeny most likely would prefer to add improvements to a greater or lesser extent, rather than complete fidelity.
Some people may argue that WBEs should lead as natural an existence as possible, one very much like people.
On the assumption that these people value their uploads or progeny, however, some aspects of life experience would be edited out. For example, what would motivate one of these creators to pass their WBEs through an unpleasant end-of-life experience, like vascular dementia?
The emulated lives of uploads and progeny would, to a greater or lesser extent, be edited. We could try to reason more about that.
Suppose that a emulations will be created to study how the brains of flesh-and-blood people work in general, or to study and forecast how a particular, living person will react to stimulus.
This is a reasonable application of high-fidelity whole-brain emulation. To use such emulations to forecast behavior, though, the emulation would have to be "run" on a multi-dimensional distribution of possible future sets of environmental stimuli. The variation in these distributions grows combinatorially, so even tens of thousands of runs would only provide some information about what the person is likely to do next.
Such WBEs would be only one tool in a toolbox to predict human behavior. However, they would be useful for that purpose. Your WBE could be fed many possible future lives, allowing you to make better choices about your future in the physical world, if using WBEs in that manner was considered ethical.
People on this site generally seem to agree, though, that using a high-fidelity WBE as a guinea pig to test out life scenarios is ethically problematic. If these life scenarios were biased in favor of delivering positive outcomes to the WBEs, maybe we would not have as much of a problem with that. Perhaps the interaction of two WBEs could be observed over many scenarios, allowing people to better choose companions.
WBEs could end up being used for this purpose, ethical or not. Again, though, I suspect that more data about people's reactions could be gained if modified WBEs were used in some of the tests.
It's worth exploring, but high-performance neuromorphic or algorithmic minds would still be the better choice for actually controlling physical conditions.
There is a tremendous amount of good material in here, thanks...
The thing that I would like to see added is a perspective on how changeable, or malleable, WBEs would be, once created.
One of the main reasons I am challenging WBEs is because I think that brain emulations would be very easy to change, alter and improve along the axes of defined performance metrics.
If they will be highly malleable, those who wish to use them to generate productivity would instead use improved (neuromorphic) versions. Additionally, a WBE which had some control of its own make-up would consider improving itself.
I believe that we can show that WBEs would be very readily improvable and changeable. Once they are improved and changed enough, they are no longer WBEs and are instead neuromorphic entities.
For the next few years and possibly decades, the development of brain emulation technology will occur alongside the development of neuromorphic technology.
Some teams will be primarily focused on achieving extremely accurate renditions of sections of actual brain tissue, as well as increasingly accurate neural maps which are sometimes based on high-throughput scans of actual brain tissue. These teams will wish to based their work on individual neurons and glia that are very much like actual cells.
However, Henry Markram, director of Europe's Human Brain Initiative, has asserted that we need not model anything like the full complexity of gene expression and protein formation in human neurons in order to accurately represent firing patterns. Those pursuing the path toward WBEs will be willing to compromise on the issue of the level of detail at which individual cells are modeled, to varying degrees. Perhaps some will discover ways to measure whether these simplifications generate a statistically significant difference in how the simulated brain might react to stimulus.
Other teams will be more concerned with using models of groups of neurons as a calculation tool may be less concerned with whether they accurately represent individual neurons. "Neural net" technology was not intended to accurately model the brain, and the individual elements in a neural net are nothing like cells.
Nonetheless, these teams will learn everything they can from those who are trying to simulate actual brain function, and some of the same people will work in both sub-disciplines and different points in their careers.
If people from the simulation and human connectome camps develop an understanding of some new aspect of brain function, those who are just trying to find new AI methods to build into software will be able to take advantage of the results. However, they may be able to shave a lot of compute cycles out by utilizing abstractions of the newly-realized insight about structure and function to idealized neurons and glia that do not really try to approximate the function of living tissue much at all.
We cannot entirely predict whether extremely detailed models of individual cells are necessary for neuromorphic AI. However, I am interested in whatever evidence is available.
Engineers attempting to improve either a WBE or a piece of neuromorphic tissue would have considerable advantages that are unavailable to medical teams working with actual brains and nerves.
Medical teams who work to repair spinal injuries are able to stimulate nerve fibers and trace the nerves into the brain. However, a vast set of experimental tools would be available to WBE or Neuromorphic Engineers.
These engineers would be able to write program which cause any specific neuron or group of neurons to fire at any time. They would be able to select the firing pattern for each, and the relative timing of a group of neurons.
They would be able to configure neurotransmitter output at will, and, importantly, they would also be able to set the number of neurotransmitter receptors on cell surfaces.
Altering the concentration of cell surface receptors would, for example, allow the neuromorphic tissue engineers to greatly influence what stimuli are pleasurable. They would be able to set patterns for these cell surface receptors which never occur in the natural course of gene expression in the brain.
We have already done a fair amount of mapping of the neural basis of pain and pleasure. Forthcoming in the next ten years, we will also have results from NIH's human connectome project. Neuromorphic tisssue engineers will begin their work with vast resources of data on the generation of pain and pleasure, the purpose and use of these sensations.
If they had either a WBE, or a differently-configured piece of neural tissue available to them, seemingly they would have a strong ability to re-wire what causes pain and pleasure in order to suit their needs.
Such techniques alone could allow a WBE to cross the line from an accurate representation of a human mind to something fundamentally different.
Apparently, an advantage of creating a thread with a controversial and heterodox first entry is that for a time you get to write all of it yourself! :) That's OK, because I have a fair amount of brain dumping to do on this subject.
Brain grafts are a very difficult idea in actual brain tissue today.
One of the key reasons, however, will begin to become a non-factor: tissue rejection. We can now grow neurons that have the same genetic code as yours or mine in the lab (I actually did this.) A method is to turn induced pluripotent stem cells (iPSCs) which may have been created from your own skin, into nerve cells.
I grew a small plate of such cells. I did not try to distinguish which among them were neurons, and which were glia. I am sure how far along we are toward growing a complete neural column, or a section of brain.
Assuming the neurons were grown, however, installation would still be very difficult. I am not willing to say impossible, but we have some challenges.
The balance and configuration of the glia would be difficult to control. Blood flow through both large vessels and capillaries would have to be restored to the added section.
Another issue importantly, perhaps, is that neurons in the brain have long axons. The "white matter" of the brain contains portions of these axons that string from brain region to brain region. It is a tangled net. Replacement neurons might have to be literally "woven in" to this net. Advantageously, the axons that are already there are sometimes bundled, but they are stuck together. It is not like stripping a large wire and seeing many filaments pop out.
Physically "weaving" new neurons into the brain is a lot more challenging than weaving them into a WBE or a piece of neuromorphic tissue.
At any given point in the early history of neuromorphic engineering, there will be a greater or lesser understanding of the relationship between structure and function. However, using WBEs and neuromorphic tissue in experiments to try to elicit function from structure will be very inexpensive. Tens of thousands or billions of experiments could be run with a single set of macros.
For this reason, I forecast, with considerable but not complete certainty, that the existence of either WBEs or functional neuromorphic tissue would quickly lead to a much greater understanding of the relationship between structure and function.
Can we be absolutely certain that this understanding would very quickly permit designs of purpose-built brain configurations that improve along the dimensions of particular performance metrics? We should try to build the case for and against that hypothesis. My instinct is to believe that these experiments would facilitate mind design, but people could present other evidence.
The future of AI will come out very differently if sections of neural tissue cannot be made to function usefully, separately from from a WBE.
Similarly, the future of AI will come out very differently if removing parts of the brain from an emulation causes the brain to become non-functional.
We know from studies of stroke and other forms of brain damage that brain function does not immediately degrade if a small section of brain is injured. Therefore, removing sections from a WBE might reduce the functionality of the WBE, but would not diminish it entirely.
There is no precedent for adding sections of brain matter to an existing brain. If such operations were performed on a WBE, however, the changes would be very different than they would be to actual brain tissue.
If we are able to conclude that alteration or removal of part of the WBE would be desirable for the purposes of the emulation's controllers, then we should conclude that WBE technology in a sense flows into neuromorphic technology, and is not separate from it in a fundamental way.
Even without extending the definition of neuromorphic, a WBE with a high-speed link to algorithms is clearly neuromorphic once significant portions of the neural simulation components are altered or removed.
A human WBE could have a very high-speed link, either with conventional computers running algorithms which the WBE triggers regularly, or with other WBEs.
If these links were sufficiently fast and robust, then we would do best to analyze the cognitive capacity of the system of the WBE and the links taken together, rather than thinking of them as separate units.
At a certain point, linking a WBE to many other software tools creates an enhanced system which is very different from a human mind. Whether we call the combined system neuromorphic or just highly enhanced is a question of definitions. However, the combined system could develop to the point where it is very different than an ordinary person or team of people who can call on a powerful computer to calculate a result.
Today, people are able to input data into calculating machines through speech and gestures, including drawing and typing.
Additionally, machines can gather biomarker data produced by the person. We can also issue a simple command to transmit a large block of previously prepared data.
These input mechanisms have certain potential disadvantages:
-They are somewhat inaccurate -They are slow. (Although triggering a larger file transmission makes up for the speed deficit, under many circumstances.)
We can receive more information through sensory input than we can transmit out.
Additionally, at what point does such a combination cease to be more like a human mind-computer interface and instead require re-classification as a neuromorphic or otherwise novel entity?
Unless, prior to the emergence of neuromorphic AI, forms of AI that do not include neurologically-inspired elements become more dominant.
So, if we can establish that progress in emulation technology will quickly result in functional, malleable products, then for the most part future productivity will be generated by purpose-built neuromorphic computing resources rather than by human-like WBEs.
These characteristics are more available to "The Boss" if "The Boss" considerably alters a malleable emulation.
Such an altered emulation is now neuromorphic.
Thus: if one or more "Bosses" is constructing a workforce, these "Bosses" will prefer neuromorphic components over whole-brain emulations.
Thus, if emulations are sufficiently malleable, there is no economy of whole-brain emulations: There is an economy of neuromorphic computing resources.
If the technology is available, "The Boss" will prefer that its work force have high-speed connections to other computing resources. "The Boss" will also prefer that its work force have high-speed connections to whatever sensory input is relevant to the task.
"The Boss" can get more done if it can create new workers, and turn them on and off at will, without ethical or regulatory constraints.
If the technology is available,"The Boss" will prefer to employ cogntive capacity which has no personhood, and to which it has no ethical obligation.
If the emulation is controlled by "The Boss," what incentives does "The Boss" have?
-to increase the emulation's throughput and efficiency -to increase the emulation's focus on the task that generates value -to avoid activities by regulators, protesters or other outsiders which could cause work stoppages.
AABoyles also begins to address another important and much-discussed question:
Can the emulation interface with:
Sensory inputs unavailable to the human brain?
Reasoning, calculation, memory modules and other minds in more direct ways?
Rather than inputting data into computers and observing the outputs of computers and sensory devices as we do today.
Just laying some more groundwork... One distinction the discussion requires:
Who is in control of the components and the environment of the emulation?
Possibilities:
An outside entity, attempting to gain economic or other value by using the emulation to complete information processing tasks. (I'll call this "The Boss.")
-The environment was established to maintain the emulation, which is not "given a job," but was created for scientific observation by outsiders.
-The emulation is not given a job, but environment was created by outsiders as a platform for experimentation on emulations.
-Perhaps the emulation was created as an "upload" of a person, or as their designed child or progeny.
-The emulation has a greater or lesser degree of control over its own environment or composition.
Example of lesser degree of control: It can decide to select some of the content it sees and listens to.
Example of greater degree of control: It can directly alter one of its emotions by "twisting a knob."
Obviously, replacing neural components with others could create an emulation which diverges from the human mind, becoming more and more neuromorphic.
Seemingly, controlling sensory inputs, and controlling the blood supply would permit a vast degree of control over the activity of the brain.
Another aspect of malleability: How much can the structure and activity of the brain be influenced using means that we presently considered external or environmental? These influences would include sensory inputs and inputs through the blood.
One aspect of malleability: At a specific point in the forecast timeline, how easy or difficult is it to create an emulation with a replacement subsystem or component that is functional, but functions differently? Does the emulation continue to work if these sub-systems are replaced or altered to a lesser or greater degree?
One aspect of what I call "fidelity" is the degree to which the emulation incorporates various aspects of neurophysiology.
For example, the emulation might or might not incorporate:
-Good models of fluid flow within the brain, and between the brain, the blood and the cerebrospinal fluid.
-Good models of the components of the blood itself, and how these components would influence brain activity.
Future timelines could assume that a great deal of additional knowledge about structure and function of components of the brain is developed before functional WBEs are developed.
Or, perhaps scanning technology improves rapidly, allowing for higher and higher levels of fidelity, but our knowledge of how the brain actually works does not advance as rapidly.
We could imagine a timeline where extremely high fidelity emulations that function are created without functional, low-fidelity emulations being created first.
Or, we could imagine a stepwise process where lower-fidelity emulations that function are created first, then these are "improved" to the point that they represent the workings of the human mind more and more accurately.
Another distinction:
Many people have worked to reason about the level of "fidelity" of WBEs. That is to say, how near is a WBE to being and accurate representation of a human brain-what does it leave in, and what does it leave out.
In order to analyze the future of brain emulations further, I want to begin to add some distinctions:
The level of "malleability" of an emulation represents the degree to which its progress through time can be influenced by specific attempts to change it or control its environment.
The precision of this distinction needs to be increased, and people can comment about that here.
I wish to see whether we can show that human whole-brain emulations wll be essentially neuromorphic in a great many ways.
Almost as soon as they exist, something more effective and productive will become available.
The hypothesis is that human Whole-Brain Emulation will not be a recognizable stage in the development of AGI that lasts for any significant amount of time. Also, an "algorithmic economy" of human whole-brain emulations is highly unlikely to be anything but science fiction.
The goal is to examine whether there are some fundamental flaws in the the nature of this forecast.
I will lay out the case after more opinions and reading material are available to us...
I am keen to explore WBEs of other animals, but let's focus on humans and their plausible successors for the moment.
There are complications well worth considering, of course...an animal mind could be economically productive, and a human-animal chimera emulation might seem to be one plausible successor....evidence for that forecast could also be developed...
So many questions...
Technology which can predict whether an action would be approved by a person or by an organization is:
-Practical to create, first applied to test cases, then to limited circumstances, then in more general cases.
-For the test cases and for the limited circumstances, it can be created using some existing machine learning technology without deploying full-scale natural language processing.
-Approval/disapproval is a binary value, and appropriate machine learning approaches would includes logistic regression or forest-and-trees methods. We create a model using training data, and the model may output P(approval | conditions) . The model is not that different from one used to predict a purchase or a variety of other online behaviors.
-A system which could forecast approval and disapproval would be useful to PEOPLE, well before it became useful as a basis for selecting AI motivations.
Predicting whether people would approve of a particular action is something that we could use machine learning for now.
These approaches advance the idea from a theoretical construct to an actual, implementable project.
Thanks to Paul for the seed insight.
In addition to determining whether an action would be approved using a priori reasoning, an approval-directed AI could also reference a large database of past actions which have either been approved or disapproved.
Alternatively, in advance of ever making any real-world decision, the approval-directed AI could generate example scenarios and propose actions to people deemed effective moral reasoners many thousands of times. Their responses would greatly assist the system in constructing a model of whether an action is approvable, and by whom.
A lot of approval data could be created fairly readily. The AI can train on this data.
Paul, I think you're headed in a good direction here.
On the subject of approval-directed behavior:
One broad reason people and governments disapprove of behaviors is that they break the law or violate ethical norms that supplement laws. A lot of AGI disaster seems to incorporate some law-breaking pretty early on.
Putting aside an advanced AI that can start working on changing the law, shouldn't one thing (but not the only thing) an approval-directed AI do is constantly check whether its actions are legal before doing them?
The law by itself is not a complete set of norms of acceptable behavior, and violating the law may be acceptable in exceptional circumstances.
However, why can't we start there?
That's pretty cool-could you explain to me how it does not cause us to kill people who have expansive wants in order to reduce the progress toward entropy which they cause?
I guess in your framework the goal of Superintelligence is to "Postpone the Heat Death of the Universe" to paraphrase an old play?
The picture of Superintelligence as having and allowing a single values systems is a Yudowsky/Bostrom construct. They go down this road because they anticipate disaster along other roads.
Meanwhile, people invariably will want things that get in the way of other people's wants.
With or without AGI, some goods will be scarce. Government and commerce will still have to distribute these goods among people.
For example, some people will wish to have as many children or other progeny as they can afford, and AI and medical technology will make it easier for people to feed and care for more children.
There is no way to accommodate all of the people who want as many children as possible exactly when they want them.
What values scheme successfully trades off among the prerogatives of all people who want many progeny? After a point, if they persist in thinking this, the many people who share this view eventually need to compromise through some mechanism.
The child-wanters will also be forced to trade off their goals with those who hope to preserve a pristine environment as much as possible.
There is no reconciling these people's goals completely. Maybe we can arbitrate between them and prevent outcomes which satisfy nobody. Sometimes, we can show that one or another person's goals are internally inconsistent.
There is no obvious way to show that the child-wanter's view is superior to the environment-preserver's view, either. Both will occasionally find themselves in conflict with those people who personally want to live for as long as they possibly can.
Neither AGI nor "Coherent Extrapolated Volition" solves the argument among child-wanters, and it does not solve the argument between child-wanters, environment-preservers and long-livers.
Perhaps some parties could be "re-educated" or medicated out of their initial belief and find themselves just as happy or happier in the end.
Perhaps at critical moments before people have fully formulated their values, it is OK for the group to steer their value system in one direction or another? We do that with children and adults all of the time.
I anticipate that IT and AI technology will make value-shifting people and populations more and more feasible.
When is that allowable? I think we need to work that one out pretty well before we start up an AGI which is even moderately good at persuading people to change their values.
Having a social contract with your progenitors seems to have some intergenerational survival value. I would offer that this social contract may even rank as an instrumental motivation, but I would not count on another intelligence to evolve it by itself.
Typically, while some progenitors value their descendants enough to invest resources in them, progenitors will not wish to create offspring who want to kill them or technology which greatly assists their descendants in acting against the progenitor's goals. (There are exceptions in the animal world, though.)
This guidline for a social contract between progenitors and offspring would hold true for our relationship with our children. It would also hold true for our relationship with non-human intelligence and our relationship with enhanced people we help to bring into the world.
In turn, should enhanced people or superintelligent machines themselves become progenitors, they also will not wish their descendants, or, if they are unitary systems, later versions of themselves, to act very much against their goals.
Programming goals in a way that that values progenitors seems worthy, both for our immediate progeny, and for their progeny.
Stunting, tripwires, and designing limitations on the AI's goals and behaviors may be very powerful tools.
We are having a hard time judging how powerful they are because we do not have the actual schemes for doing so in front of us to judge.
Until engineering specifications for these approaches start to be available, the jury will still be out.
We certainly can imagine creating a powerful but stripped-down AGI component without all possible functionality. We can also conceive of ways to test it.
Just to get the ball rolling, consider running it one hundred times more slowly than a human brain while it was being tested, debugged and fed various kinds of data.
Consider running it in a specially-designed game world during testing.
We would be able to make some inferences about how likely the system would be to break the law or otherwise become threatening in real life if it did so in in the game world.
If it became threatening under test conditions, that increases the chance that it would become threatening in real-life. Re-design and repeat the tests.
On the other hand, if it did not become threatening under some test conditions, that alone is not enough to prove that it is time to use it in the real-world. So, the developers continue to test and analyze.
The trick, then, is creating a risk assessment process to check the safety of heavily tested AGI components which passed all tests.
The stunting and tripwires would have to go through a validation process before they could be verified as sufficient to permit safe testing.
Just as a heuristic, perhaps we are looking for probabilities of danger below one in a million or one in a trillion.
Or, perhaps we are comparing this probability with the probability of a different catastrophic or existential risk which the AGI can help to mitigate. However, building an AGI in response to some other emergency, the way we built the atomic bomb, seems like a recipe for trouble.
Specifically, the possibility that the AGI will be able to lengthen the life or permit upload of a small group of important people IS NOT enough. Saving the life of The President or an important corporate executive is not enough reason to create AGI.
The development team only escalates to the next level of testing and utilization when danger probabilities falls sufficiently.
Furthermore, the development team is NOT responsible for the risk assessment. That job belongs to another team(s) with a designed process for providing oversight.
If the probability of danger never falls enough, the AGI development team is not allowed to continue.
Right now, such danger probabilities are too high. However, trying to design stunting and tripwire schemes wlll provide infromation that allows us to update the probabilities.
This perfect utility function is an imaginary, impossible construction. It would be mistaken from the moment it is created.
This intelligence is invariably going to get caught up in the process of allocating certain scarce resources among billions of people. Some of their wants are orthogonal.
There is no doing that perfectly, only well enough.
People satisfice, and so would an intelligent machine.
So we are considering a small team with some computers claiming superior understanding of what the best set of property rights is for the world?
Even if they are generally correct in their understanding, by disrespecting norms and laws regarding property, they are putting themselves in the middle of a billion previously negotiated human-to-human disputes and ambitions, small and large, in an instant. Yes, that is foolish of them.
Human systems like those which set property rights either change over the course of years, or typically the change is associated with violence.
I do not see a morally superior developer + AGI team working so quickly on property rights in particular, and thereby setting off a violent response. A foolish development team might do that, but a wise team would roll the technology and the wrenching changes out gradually.
If they really are morally superior, they will first find ways to grow the pie, then come back to changing how it gets divided up.
I hear you.
The issue THEN, though, is not just deterring and controlling an early AGI. The issue becomes how a population of citizens (or an elite) control a government that has an early AGI available to it.
That is a very interesting issue!
Just to go a bit further with Pinkner, as an exercise try for once to imagine a Nuturing AGI. What would it act like? How would it be designed?
On infrastructure profusion:
What idiot is going to give an AGI a goal which completely disrespects human property rights from the moment it is built?
Meanwhile, an AGI that figured out property rights from the internet would have some idea that if it ignored property rights, people would want to turn it off. If it has goals which were not possible to achieve once turned off, then it would respect property rights for a very long time as an instrumental goal.
And I do believe we should be able to turn off an off-the-grid AGI running on a limited amount of computing resources whose behavior was previously simulated many times.
So, we seem to be getting closer to being willing to test disabled AGIs and AGI components, if we can avoid people misusing them.
On the Pinkner excerpt:
He is part way to a legitimate point.
The distinction is not between male and female. Instead, the issue is whether to design a mind around the pursuit of a mathematically optimal single objective.
Pinker is right that singlemindedly pursuing a single, narrow objective would be psychotic for a person.
Meanwhile, Omohundro points out that the amount of computing time required to use a computerized optimization method to make decisions explodes as more knowledge about the real world is built into the optimization.
Herbert Simon, meanwhile, points out that people do not optimize-they SATISFICE. They choose an answer to the problem at hand which is NOT OPTIMAL, but is GOOD ENOUGH using heuristics, then they move on to solve the next problem.
In finance, meanwhile, when constructing a portfolio, you do not exactly optimize a single objective-See Sharpe and Markowitz. If anything, you optimize a combination of risk and reward.
To resolve just these two orthogonal metrics into a single utility function requires a lot of cognitive labor-you have to figure out the decision-maker's level of "risk aversion." That is a lot of work, and frequently the decision-maker just rebels.
So now you're trying to optimize this financial portfolio with the two orthogonal metrics of risk and reward collapsed into one- are you going to construct a set of probability distribution functions (pdf) over time for every possible security in your portfolio? No, you're going to screen away alot of duds first and think harder about the securities which have a better chance of entering the portfolio.
When thinking about mind design, and just when thinking effectively, always incorporate:
-Bounded rationality -Diversification -Typically, some level of risk aversion. -A cost to obtaining new pieces of information, and value of information -Many, complex goals.
Many of these goals do not require very much thinking to determine that they are "somewhat important."
Suppose we have several little goals (such as enough food, avoid cold, avoid pain, care for family, help our friends, help humanity).
We will expend a lot of effort resolving them from orthogonal metrics into a single goal. So instead, we do something like automatically eat enough, avoid cold and avoid pain, unless there is some exception is triggered. We do not re-balance these factors every moment.
That process does not always work out perfectly-but it works out better than complete analysis paralysis.
A SENSIBLY DESIGNED MIND WOULD NOT RESOLVE ALL ORTHOGONAL METRICS INTO A SINGLE OBJECTIVE FUNCTION, nor try to assess a pdf about every possible fact.
DROP THE PAPERCLIP MAXIMIZERS ALREADY. They are fun to think about, but they have little to do with how minds will eventually be designed..
This is an entire direction of research which deserves vastly more than a single throwaway line in one blog. There should be a whole thread just about this, then a proposal, then a research team on it.
I hear you and kind of agree. On the other hand, when a 3-year old lies, sometimes they to pull it off.