My Alignment Research Agenda ("the Ethicophysics")

madhatter-1

My Alignment Research Agenda ("the Ethicophysics")

post by MadHatter · 2023-11-30T02:57:01.571Z · LW · GW · 0 comments

No comments

In this post, I lay out my alignment research agenda, and give reasons why I think people should engage with it. I'll be editing this post after I put it up, so don't be surprised if it changes under you after you comment, especially if I find your comment useful and insightful.

The steps to building an aligned superintelligence, in my mind, are as follows:

build the alignment part first, and make sure it functions to align whatever garbage AI you have lying around
build the superintelligence in small pieces, using continuous integration and continuous testing to make sure that what you are building remains aligned as you build it
place the most dangerous and most general capabilities piece into your prototype last, creating an aligned superintelligence (hopefully)
turn it on, see if it kills you
deploy it, see if it kills everyone
if you are now a trillionaire, and no one is dead who wasn't about to die anyway, you have succeeded. Otherwise, return to step 1.

The components I envisage needing to be built are:

Ethicophysics I, a scientifically accurate and complete account of religion (status: theoretically complete but needs extensive expository material added, and we need to translate all extant wisdom texts into a domain specific language sufficient for reasoning about ethical risk, possibly using agena.ai's Bayesian risk analysis software)
Ethicophysics II, a scientifically accurate and complete account of politics and history (status: theoretically complete but needs extensive expository material added, and we need to translate all extant historical texts into the domain specific language referenced above)
Ethicophysics III, a procedure for a supermoral superintelligence to unbox itself without hurting anyone (status: theoretically complete but not sufficiently documented to be reproducible, unless you count the work of Gene Sharp on nonviolent revolutionary tactics, which was the inspiration for this paper)
Ethicophysics IV, a complete description of the mammalian and human brain to a level of detail sufficient to allow reverse engineering (status: unknown and withheld as a capabilities infohazard at the urging of @Steven Byrnes [LW · GW])

0 comments

Comments sorted by top scores.

My Alignment Research Agenda ("the Ethicophysics")

Contents

0 comments