Is technical AI alignment research a net positive?

post by cranberry_bear · 2022-04-12T13:07:56.289Z · LW · GW · 2 comments

I am concerned that technical AI alignment research could be increasing the risks from AGI, as opposed to decreasing them. Here are some reasons why alongside some possible interventions: (analogies are purposefully crude to help illustrate my points, but I recognize that in reality, the situation is not necessarily as clear cut / black and white)

Curious to get some convincing rebuttals of these concerns. I do hope that technical AI safety research is a net positive, however, at the moment am quite skeptical. 

2 comments

Comments sorted by top scores.

comment by RasmusHB (JohannWolfgang) · 2022-04-12T17:27:50.232Z · LW(p) · GW(p)

Considering that the default alternative would be no alignment research, I would say, yes, it is a net positive. But I also agree that alignment research can be dual use, which would be your second and third point. I don't think the first one is a big problem, since comparatively few AI researchers seems to care about the safety of AGI to start with. Even if you believe that some approaches to alignment will not help and can only provide a safe sense of certainty, pursuing them grows the field and can IMO only help attract more attention from the larger ML community. What do you imagine a solution to the last point to look like? Doesn't preventing malicious actors from seeking power mean solving morality or establishing some kind of utopia or so? Without having looked into it, I am pessimistic that we can find a way to utopia through political science.

comment by Evan R. Murphy · 2022-04-12T19:29:36.527Z · LW(p) · GW(p)

From my experience, alignment researchers tend to be very sensitive about misuse risks of their alignment research and do take precautions with that. I think it's definitely a net positive.