Posts
Comments
Perhaps I misunderstand what you mean by "first principles," or safety therefrom.
It seems like there are significant first principles omitted as fundamental premises.
Such as aspects of natural philosophy, emotional intelligence and other key factors that enabled humans to gain control of the planet, along with how and why we struggle as such already, and why, without needing to become a very successful "new first species" per se, nor misaligned, nor achieve full agency, AGI could still easily wipe everything out.
Merely by being devoid of emotional intelligence, empathy, symbiosis, sustainability etc.,
yet in full control of the systems on which we rely to facilitate those for our societies.
It would seem to me that those factors, along with the basics of good diplomacy, ombudsmanship and human international relations,
would more be the "first principles," on which the foundations of safety in AGI depend, beyond anything else.
Contrary to the points against extremely rapid takeoff, here are some points for it:
1. AI systems now exist to help more rapidly develop and improve subsequent ones.
2. Which will help even more rapidly develop subsequent ones, and so on.
3. Until at least one of them is allowed to near-instantly develop an even better version of itself, which will happen somewhere, somehow.
4. And then, again and again.
5. User adoption is already faster than any other single product ever released.
6. The pace of which as we all know, accelerates as logarithmic.
Again, it seems very clear and I would think it self-evident and inextricable to discuss that even in full alignment with human values, AGI could easily become supremely destructive of us and themselves.
Are humans not self-destructive, despite and often much because of even our highest values & best efforts?
"...ML-based agents developing the capability to seek influence... This would not be a problem if they do so only in the ways that are aligned with human values."
Yikes. How is it possible to make such an extraordinarily contra-evidenced assumption?
Please reconsider that fundamental premise, as it informs all further approaches.
With kind apologies, this section seems surprisingly lacking in the essential natural philosophy and emotional factors contributing to the agency & goals of biological organisms, which are the only entities we're aware of thus far who've developed agency, and therefore whose traits must be inextricable from even abstract conversation.
(Such as the evolved brain/body connections of emotion/pain/pleasure that feeling creatures have, like a sense of warmth and happiness from positive community and group engagements, compassion for others, etc).
Especially in how we develop Consequentialism, Scale, & Planning as relates to our self-preservation instincts, and the connection thereof to an innate understanding of how deeply we depend on our ecosystems and the health of everything else, for our own well-being.
(It seems safe to predict that as such, biological ecosystems with feeling agents are the only ones which could mutually self-sustain by default on Earth, as opposed to simply using everything up and grinding the whole thing to a stop.
That, or subsuming the current biology and replacing it with something else entirely, which is techno-integrative, but still obviating of us.
Especially if powerful-enough free agents did not feel a concern for self-preservation via their mutual inter-dependency on all other living things, nor a deep appreciation for life and its organisms for its own sake, nor at least any simulation of pain & pleasure in response to positive and negative impacts thereon).
Merely defining agency as your six factors without any emotional component whatsoever,
and goals as mere endpoints devoid of any alignments with natural philosophy,
is a very hollow, superficial, and fragile approach not just predicated on oversights (the omissions of which can have very harmful repercussions),
but in terms of safety assessment, negligent of the fact that it may even be an advantage for AGI in subsuming us on autopilot, to *not develop agency to the extent you've defined it here.
Lastly of course, in assessing safety, it also appears you've omitted the eventuality of intentionally malevolent human actors.
Some key assumptions and omissions here, very respectfully.