My Alignment Research Agenda ("the Ethicophysics")
post by MadHatter · 2023-11-30T02:57:01.571Z · LW · GW · 0 commentsContents
No comments
In this post, I lay out my alignment research agenda, and give reasons why I think people should engage with it. I'll be editing this post after I put it up, so don't be surprised if it changes under you after you comment, especially if I find your comment useful and insightful.
The steps to building an aligned superintelligence, in my mind, are as follows:
- build the alignment part first, and make sure it functions to align whatever garbage AI you have lying around
- build the superintelligence in small pieces, using continuous integration and continuous testing to make sure that what you are building remains aligned as you build it
- place the most dangerous and most general capabilities piece into your prototype last, creating an aligned superintelligence (hopefully)
- turn it on, see if it kills you
- deploy it, see if it kills everyone
- if you are now a trillionaire, and no one is dead who wasn't about to die anyway, you have succeeded. Otherwise, return to step 1.
The components I envisage needing to be built are:
- Ethicophysics I, a scientifically accurate and complete account of religion (status: theoretically complete but needs extensive expository material added, and we need to translate all extant wisdom texts into a domain specific language sufficient for reasoning about ethical risk, possibly using agena.ai's Bayesian risk analysis software)
- Ethicophysics II, a scientifically accurate and complete account of politics and history (status: theoretically complete but needs extensive expository material added, and we need to translate all extant historical texts into the domain specific language referenced above)
- Ethicophysics III, a procedure for a supermoral superintelligence to unbox itself without hurting anyone (status: theoretically complete but not sufficiently documented to be reproducible, unless you count the work of Gene Sharp on nonviolent revolutionary tactics, which was the inspiration for this paper)
- Ethicophysics IV, a complete description of the mammalian and human brain to a level of detail sufficient to allow reverse engineering (status: unknown and withheld as a capabilities infohazard at the urging of @Steven Byrnes [LW · GW])
0 comments
Comments sorted by top scores.