Posts
Comments
Given only finite time, isn't one always omitting nearly everything? If you believe in dishonesty by omission is everyone not dishonest, in that sense, nearly all the time? You can argue that only "relevant" information is subject to non-omission, but since relevance is a subjective, and continuous, property this doesn't seem like very useful guidance. Wherever you choose to draw the line someone can reasonably claim you've omitted relevant (by some other standard) information just on the other side of that line.
It seems like this discussion might cover power imbalances between speaker and listener more. For example, in the border agent example, a border control agent has vastly more power than someone trying to enter the country. This power gives them the "right" (read: authority) to ask all sorts of question, the legitimacy of which might be debatable. Does deep honesty compel you to provide detailed, non-evasive answer to questions you personally don't believe the interlocutor has any business asking you? The objective of such interactions is not improving the accuracy of the border agent's worldview and it is unlikely that anything you say is going to alter that worldview. It seems like there are many situations in life where you have little choice but to interact with someone, but the less you tell them the better. There's a reason witnesses testifying in a courtroom are advised to answer "yes" or "no" whenever possible, rather than expounding.
In software engineering things often become "accidentally load bearing" when people don't respect interfaces. If they go digging around in a component's implementation they learn things that happen to be true but are not intended to be guaranteed to be true. When they start relying on these things it limits the ability of the maintainer of the component to make future changes. This problem is exacerbated by under-specified interfaces, either when formal specification mechanisms are underutilized or, more often, due to the limits of most formal interfaces specification mechanisms, when important behavioral aspects of an interface are not documented.
I don't think you even need to go as far as you do here to undermine the "emergent convergence (on anti-human goals)" argument. Even if we allow that AIs, by whatever means, develop anti-human goals, what reason is there to believe that the goals (anti-human, or otherwise) of one AI would be aligned with the goals of other AIs? Although infighting among different AIs probably wouldn't be good for humans, it is definitely not going to help AIs, as a group, in subduing humans.
Now let's bring in something which, while left out of the primary argument, repeatedly shows up in the footnotes and counter-counter arguments: AIs need some form of human cooperation to accomplish these nefarious "goals". Humans able to assist the AIs are a limited resource, so there is competition for them. There's going to be a battle among the different AIs for human "mind share".
Not only that, but if your goals is to create a powerful army of AIs the last thing you'd want to do is make them all identical. Any reason you're going to choose for why there are a huge number of AI instances in the first place -- as assumed by this argument -- would want those AIs to be diverse, not identical, and that very diversity would argue against "emergent convergence". You then have to revert to the "independently emerging common sub-goals" argument, which is a significantly bigger stretch because of the many additional assumptions it makes.
Isn't multi-epoch training most likely to lead to overfitting, making the models less useful/powerful?
If it were possible to write an algorithm to generate this synthetic training data how would the resulting training data have any more information content than the algorithm that produced it? Sure, you'd get an enormous increase in training text volume, but large volumes of training data containing small amounts of information seems counterproductive for training purposes -- it will just bias the model disproportionately toward that small amount of information.
Why wouldn't people (and maybe even AIs, at least up to a point) be applying these ever-advancing AI capabilities to developing better and better interpretability tools as well? I.e., what reason is there to expect an "interpretability gap" to develop (unless you believe interpretability is a fundamentally unsolvable problem, in which case no amount of AI power is going to help)?