What other problems would a successful AI safety algorithm solve?

post by DirectedEvolution (AllAmericanBreakfast) · 2021-06-13T21:07:29.244Z · LW · GW · 4 comments

This is a question post.

Corporations and governments are in some ways like superintelligences, and in others ways not. Much of economics, political science, and sociology seem to tackle the problem of why institutions fail to align with human interests. Yet the difference in architecture and capabilities between brains and computer programs suggests to me that aligning collective bio-superintelligence is a quite different problem from aligning AI superintelligence. It might be that because we can see and control the building blocks of AI, and have no hard ethical limits on shaping it as we please, aligning AI with human values is an easier problem to solve than aligning human institutions with human values. A technical solution to AI safety might even be a necessary precursor to understanding the brain and human relationships well enough to provide a technical solution to aligning human institutions.

 If we had a solid technical solution to AI safety, would it also give us technical solutions to the problem of human collective organization and governance? Would it give us solutions to other age-old problems? If so, is that reason to doubt that a technical solution to AI safety is feasible? If not, is that reason for some optimism? Finally, is our lack of a technical account of what human value is a hindrance to developing safe AI?


answer by MSRayne · 2021-06-14T15:32:45.768Z · LW(p) · GW(p)

I think you've got it the wrong way around. In fact, that's probably my biggest issue with the whole field of alignment. I think that it's probably easier to solve the problem of human institution alignment than AI alignment, and that to do so would help solve the AI alignment problem as well.

The reason I say this is because individual humans are already aligned to human values, and it should be possible by some means to preserve this fact even while scaling up to entire organizations. There is no a priori reason that this would be more difficult than literally reverse engineering the entire human mind from scratch! That is, it doesn't actually matter what human values are, if you believe that humans actually can be trusted to have them - all that matters is that you can be certain they are preserved by the individual-to-organization transition. So my position can be summarized as "designing an organization which preserves human values already present is easier than figuring out what they are to begin with and injecting them into something with no humanity already in it."

As a matter of fact, this firm belief is the basis of my whole theory of what we ought to be doing as a species - I think AI should not be allowed to gain general intelligence, and that instead we should focus on creating an aligned superintelligence out of humans (with narrow AIs as "mortar", mere extensions to human capacities) - first in the form of an organization, later on a "hive mind" using brain computer interfaces to achieve varying degrees of voluntary mind-to-mind communication.

comment by Pattern · 2021-06-14T15:45:47.100Z · LW(p) · GW(p)
reverse engineering the entire human mind from scratch!

That might not necessarily be required for AGI, though that does seem to be what figuring out how to program values is.

Replies from: MSRayne
comment by MSRayne · 2021-06-15T16:16:55.257Z · LW(p) · GW(p)

The latter is more what I was pointing to.


Comments sorted by top scores.

comment by Charlie Steiner · 2021-06-14T05:39:57.463Z · LW(p) · GW(p)

The best technical solution might just be "use the FAI to find the solution." Friendly AI is already, at its core, just a formal method for evaluating which actions are good for humans.

It's plausible we could use AI alignment research to "align" corporations, but only in a weakened sense where there's some process that returns good answers in everyday contexts. But for "real" alignment where the corporation somehow does what's best for humans with high generality... well, that means using some process to evaluate actions, so this is the case of using FAI.