Do any of the AI Risk evaluations focus on humans as the risk?

post by jmh · 2022-11-30T03:09:44.766Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    4 Conor Sullivan
    4 Daniel Kokotajlo
None
No comments

I am not up on much of the AI risk discussion but for this outsider most of the focus seems on the AI taking actions. 

I recall someone (here I think) posting a comment about how a bio research AI initiative seeking to find beneficial things was asked if the tools could be used to find harmful things. They changed their search and apparently found a number of really bad things really quickly.

Does anyone look at, have concerns or estimates on risk in this area? Is it possible that the AI risk from the emergence of a very powerful AI is not as likely since before that occurs some human with a less powerful AI ends the world first, or at least destroys modern human civilization and we're back to the stone age hunter gathering world before the AI gets powerful enough do do that for/to us?

Answers

answer by Lone Pine (Conor Sullivan) · 2022-11-30T11:14:08.269Z · LW(p) · GW(p)

Around here, humans using AI to do bad things is referred to as "misuse risks", whereas "misaligned AI" is used exclusively to refer to the AI being the primary agent. There are many thought experiments where the AI convinces humans to do things which result in bad outcomes. "Execute this plan for me, human, but don't look at the details too hard please." This is still considered a case of misaligned AI.

If you break it down analytically, there needs to be two elements for bad things to happen: the will to do so and the power to do so. As Daniel notes, some humans have already had the power to do so for many decades, but fortunately none have had the will. AI is expected to be extremely powerful too, and AI will have its own will (including a will to power), so both misaligned AI and misuse risks are things to take seriously.

comment by jmh · 2022-12-01T15:33:54.810Z · LW(p) · GW(p)

Thanks for noting the terminology, useful to have in mind.

I have a follow on comment and question in my response to Daniel that I would be interested in your response/reaction.

answer by Daniel Kokotajlo · 2022-11-30T03:39:42.312Z · LW(p) · GW(p)

Is it possible that the AI risk from the emergence of a very powerful AI is not as likely since before that occurs some human with a less powerful AI ends the world first, or at least destroys modern human civilization and we're back to the stone age hunter gathering world before the AI gets powerful enough do do that for/to us?

It's definitely a possibility I and other people have thought about. My view is that takeoff will be fast enough that this outcome is unlikely; most humans don't want to destroy civilization and so before one of the exceptions gets their hands on AI powerful enough to destroy civilization when used deliberately for that purpose by humans, someone else will have their hands on AI that is even more powerful, powerful enough to destroy civilization 'on its own.'

Consider: Nukes and bio weapons can destroy the world already, but for decades the world has persisted, because none of the hundred or so actors capable of destroying the world have wanted to do so. Really I'm not relying on a fast takeoff assumption here, more like a not-multi-decade-long takeoff assumption.

comment by jmh · 2022-12-01T16:27:50.579Z · LW(p) · GW(p)

Thanks. I was somewhat expecting the observation that humans do have the ability to pretty much end things now, and have for some time, but as yet have not done so. I do agree. I also agree that in general we have put in place preventative measures to be sure those that might or are willing to end the world don't have access or absolute ability to do so.

I think that intent might not be the only source, error and unintended consequences from using AI tools seem like they are part of the human risk profile. However, that seem so obvious I would think you have that baked into your assessment but just don't mention that to keep a simple answer. I'm not sure how much that shifts balances though.

I did just have the realization that human based risk and AI risks are best thought of differently that I initially framed the question in my own mind. AI risk is much more like the risk to some other species due to human actions than the risk to humans due to human actions. That shift in view argues for the same assessment you offer.

I'm not sure what I think the relationship between AI enabling capabilities and the probability for some human, intentional or unintended, driven event looks like. I suspect that the probability increases with AI functionality. But I also think that points to two types of response. One is slowing or otherwise more cautiously proceeding with AI research -- so dovetails well addressing AI risk efforts. But employing and extending existing social tools/institutions related to risk management would help reduce the risk while allowing research to proceed as is. 

For instances, one reason I don't think we've not seen nuclear doomsday is that no one person (that might not be true now with NK but know nothing about its nuclear protocols) actually has the ability to launch some all out attack. Both structural checks and the underlying personal checks are present. Are there AI risk mitigation parallels? (I assume so given I've seen some comments about AI mergers that seems to suggest that gets the AI around constraints protecting humans but don't really know if that is a fair/useful characterization of such efforts.)

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-12-01T18:00:55.055Z · LW(p) · GW(p)

It would help if you gave examples of scenarios in which the world is destroyed by accidental use of AI tools (as opposed to AI agents). I haven't been able to think of plausible ones so far, but I haven't thought THAT much about it & so wouldn't be surprised if I've missed some. In case you are interested, here's some thinking I did two years ago on a related topic.

Replies from: jmh, jmh
comment by jmh · 2022-12-02T23:37:35.400Z · LW(p) · GW(p)

The linked tool looks interesting; thanks for sharing! 

I have not done more than skim through the list of configuration options so don't have any good feedback for you (though don't guarantee I could offer good feedback after any complete review and testing ;-) ). A couple of the options do seem to touch on my question here I think. The one's related to medical and biotech. I think you're approach is successful efforts in those areas that change the future state of a realized AIG. I think my question would best be viewed an intersection of developing AI/ML work and work in those areas.

I was trying to also provide an example as well but decided I should not just try to give an off the cuff type example so want to write something and then reread and probably rewrite. That's probably setting expectations way too high but I did want to make sure I can clearly describe a scenario rather than just dump some stream of consciousness blob on you. Still, I did want to thank you for the link and response.

comment by jmh · 2022-12-04T18:25:21.580Z · LW(p) · GW(p)

I've done a bit of back and forth in my mind on examples and find the biggest challenge that of a plausible one, rather than mealy imaginable/possible. I think the best way to frame the concern (not quite an example but close) would be gain of function type research. 

Currently I think the vast majority of that work is conducted in expensive labs (probably BL3 or 4 but might be wrong on that)  by people with a lot of time, effort and money invested in their educations. It's a complicated enough area that lacking the education makes even knowing how to start a challenge. However, I don't think the basic work requires all that much in the way of specialized lab equipment. Most of the equipment is probably more about result/finding productivity than about the actual attempted modification.

On top of that we also have some limitation, legal/regulatory, on access to some materials. But I think that is more about specific items, e.g. anthrax, and not really a barrier to conducting gain of function type research. Everyone has access to lots of bacteria and viruses but most lack knowledge of isolation and identification techniques. 

Smart tools which embody the knowledge and experience, as well as include a good ML functions really would open the door to home hobbyists that got interested in just playing around with some "harmless" gain of function or other genetic engineering. But if they don't understand how to properly contain their experiments, or don't understand that robust testing is not just testing for a successful (however that might be defined) result but also testing for harmful outcomes, then risks have definitely increased if we do actually see an increase in such activity by lay people.

I'm coming to the conclusion, though, that perhaps the way to address these type of risk are really outside the AI alignment focus as a fair amount of the mitigation is probably how we apply existing controls to evolution in smart tool use. Just as now, some things some things cannot just be ordered just by putting the order in and making payment. So maybe here the solution is more about qualifying access to smart tools as they get to some point -- though I think this is also a problematic solution -- so not just any unqualified person can play with them merely because they have an interest.

I also think a better way to frame the question might be "Given existing unintentional existential risk from human actions what is the relationship between AI and the probability of such an outcome?"  Or perhaps more specifically as "Given gain of function research will produce a human civilization ending pandemic with probability X, is X a function of AI advancement and if so what direction and shape does it take?" 

Given Connor's comment on the possibility of the misleading AI one might think that X increases with the advancement of AI but would that lead to humans being the greater threat prior to the emergence of a malicious (just an uncaring AGI that just needs humans out of the way)  I don't know.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-12-05T05:03:04.176Z · LW(p) · GW(p)

Yeah, I agree that one is fairly plausible. But still I'd put it as less likely than "classic" AGI takeover risk, because classic AGI takeover risk is so large and so soon. I think if I had 20-year timelines then I'd be much more  concerned about gain-of-function type stuff than I currently am.

No comments

Comments sorted by top scores.