Posts
Comments
Let's try a car analogy for a compatibilist position, as I understand it: there is car, and why does it move? Because it has an engine and wheels and other parts all arranged in a specific pattern. There is no separate "carness" that makes it move ("automobileness" if you will), it is the totality of its parts that makes it a car.
Will is the same, it is the totality of your identity which creates a process by which choices are made. This doesn't mean there is no such thing any more than the fact that a car is composed of identifiable parts means that no car exists, it is just not a basic indivisible thing.
There are no discrete "worlds" and "branches" in quantum physics as such. Once two regions in state space are sufficiently separated to no longer significantly influence each other they might be considered split, which makes the answer to your question "yes" by definition.
The highly specific predictions should be lowered in their probability when updating on the statement like 'unpredictable'.
That depends what your initial probability is and why. If it already low due to updates on predictions about the system, then updating on "unpredictable" will increase the probability by lowering the strength of those predictions. Since destruction of humanity is rather important, even if the existential AI risk scenario is of low probability it matters exactly how low.
This of course has the same shape as Pascal's mugging, but I do not believe that SI claims are of low enough probability to be dismissed as effectively zero.
Not everything is equally easy to describe as equations.
That was in fact my point, which might indicate that we are likely to be talking past each other. What I tried to say is that an artificial intelligence system is not necessarily constructed as an explicit optimization process over an explicit model. If the model and the process are implicit in its cognitive architecture then making predictions about what the system will do in terms of a search are of limited usefulness.
And even talking about models, getting back to this:
cutting down the solution space and cutting down the model
On further thought, this is not even necessarily true. The solution space and the model will have to be pre-cut by someone (presumably human engineers) who doesn't know where the solution actually is. A self-improving system will have to expand both if the solution is outside them in order to find it. A system that can reach a solution even when initially over-constrained is more useful than the one that can't, and so someone will build it.
I think you have a very narrow vision of 'unstable'.
I do not understand what you are saying here. If you mean that by unstable I mean a highly specific trajectory a system that lost stability will follow, then it is because all those trajectories where the system crashes and burns are unimportant. If you have a trillion optimization systems on a planet running at the same time you have to be really sure that nothing can't go wrong.
I just realized I derailed the discussion. The whole AGI in specialized AI world is irrelevant to what started this thread. In the sense of chronology of being developed I cannot tell how likely it is that AGI could overtake specialized intelligences. It really depends whether there is a critical insight missing for the constructions of AI. If it is just an extension of current software then specialized intelligences will win for reasons you state. Although some of the caveats I wrote above still apply.
If there is a critical difference in architecture between current software and AI then whoever hits that insight will likely overtake everyone else. If they happen to be working on AGI or even any system entangled with the real world, I don't see how once can guarantee that the consequences will not be catastrophic.
Too much anthropomorphization.
Well, I in turn believe you are applying overzealous anti-anthropomorphization. Which is normally a perfectly good heuristic when dealing with software, but the fact is human intelligence is the only thing in "intelligence" reference class we have, and although AI will almost certainly be different they will not necessarily be different in every possible way. Especially considering the possibility of AI that are either directly base on human-like architecture or even are designed to directly interact with humans, which requires having at least some human-compatible models and behaviours.
Just because it doesn't do exactly what you want doesn't mean it is going to fail in some utterly spectacular way.
I certainly agree, and I am not even sure what the official SI position is on the probability of such failure. I know that Eliezer in hist writing does give the impression that any mistake will mean certain doom, which I believe to be an exaggeration. But failure of this kind is fundamentally unpredictable, and if a low probability even kills you, you are still dead, and I think that it is high enough that the Friendly AI type effort would not be wasted.
(ultimately, for solutions to systems of equations)
That is true in the trivial sense that everything can be described as equations, but when thinking how computation process actually happens this becomes almost meaningless. If the system is not constructed as a search problem over high dimensional spaces, then in particular its failure modes cannot be usefully thought about in such terms, even if it is fundamentally isomorphic to such a search.
that'll be a piece of software created with very well defined model of changes to itself
Or it will be created by intuitively assembling random components and seeing what happens. In which case there is no guarantee what it will actually do to its own model or even to what it is actually solving for. Convincing AI researches to only allow an AI to self modify when it is stable under self modification is a significant part of the Friendly AI effort.
Everyone wants artificial rainman.
There are very few statements that are true about "everyone" and I am very confident that this is not one of them. Even if most people with actual means to build one want specialized and/or tool AIs, you only need one unfriendly-successful AGI project to potentially cause a lot of damage. This is especially true as both hardware costs fall and more AI knowledge is developed and published, lowering the entry costs.
I don't see why expect general intelligence to suddenly overtake specialized intelligences;
To be dangerous AGI doesn't have to overtake specialized intelligences, it has to overtake humans. Existence of specialized AIs is either irrelevant or increases the risks from AGI, since they would be available to both, and presumably AGIs would have lower interfacing costs.
Just because software is built line by line doesn't mean it automatically does exactly what you want. In addition to outright bugs any complex system will have unpredictable behaviour, especially when exposed to real word data. Just because the system can restrict the search space sufficiently to achieve an objective doesn't mean it will restrict itself only to the parts of the solution space the programmer wants. The basic purpose of Friendly AI project is to formalize human value system sufficiently that it can be included into the specification of such restriction. The argument made by SI is that there is a significant risk a self-improving AI can increase in power so rapidly, that unless such restriction is included from the outset it might destroy humanity.
Long before you have to worry about the software finding an unintended way to achieve the objective, you encounter the problem of software not finding any way to achieve the objective
Well, obviously, since it is pretty much the problem we have now. The whole point of the Friendly AI as formulated by SI is that you have to solve the former problem before the latter is solved, because once the software can achieve any serious objectives it will likely cause enormous damage on its way there.
As often happens, it is to quite an extent a matter of definitions. If by an "end" you mean a terminal value, then no purely internal process can change that value, because otherwise it wouldn't be terminal. This is essentially the same as the choice of reasoning priors, in that anything that can be chosen is, by definition, not a prior, but a posterior of the choice process.
Obviously, if you split the reasoning process into sections, then posteriors of a certain sections can become priors of the sections following. Likewise, certain means can be more efficiently thought as ends, and in this case rationality can help you determine what those ends would be.
The problem with humans is that the evolved brain cannot directly access either core priors or terminal values, and there is not guarantee that they are even coherent enough to be said to properly exists. So every "end" that rises high enough into the conscious mind to be properly reified is necessarily an extrapolation, and hence not a truly terminal end.