AI things that are perhaps as important as human-controlled AI

post by Chi Nguyen · 2024-03-03T18:07:24.291Z · LW · GW · 4 comments

Contents

4 comments

4 comments

Comments sorted by top scores.

comment by Random Developer · 2024-03-03T20:03:11.406Z · LW(p) · GW(p)

Making AIs wiser seems most important in worlds where humanity stays in control of AI. It’s unclear to me what the sign of this work is if humanity doesn’t stay in control of AI.

A significant fraction of work on AI assumes that humans will somehow be able to control entities which are far smarter than we are, and maintain such control indefinitely. My favorite flippant reply to that is, "And how did that work out for Homo erectus? Surely they must have benefited enormously from all the technology invented by Homo sapiens!" Intelligence is the ultimate force multiplier.

If there's no mathematical "secret" to alignment, and I strongly suspect there isn't, then we're unlikely to remain in control.

So I see four scenarios if there's no magic trick to stay in control:

  1. We're wise enough refrain from building anything significantly smarter than us.
  2. We're pets. (Loss of control)
  3. We're dead. (X-risk)
  4. We envy the dead. (S-risk)

I do not have a lot of hope for (1) without dramatic changes in public opinion and human society. I've phrased (2) provocatively, but the essence is that we would lose control. (Fictional examples are dangerous, but this category would include the Culture, CelestAI or arguably the Matrix.) Pets might be beloved or they might be abused, but they rarely get asked to participate in human decisions. And sometimes pets get spayed or euthanized based on logic they don't understand. They might even be happier than wild animals, but they're not in control of their own fate.

Even if we could control AI indefinitely (and I don't think we can), there is literally no human organization or institution I would trust with that power. Not governments, not committees, and certainly not a democratic vote.

So if we must regrettably build AI, and lose all control over the future, then I do think it matters that the AI has a decent moral and philosophical system. What kind of entity would you trust with vast, unaccountable, inescapable power? If we're likely to wind up as pets of our own creations, then we should definitely try to create kind, ethical and what you call "unfussy" pet owners, and ones that respect real consent.

Or to use a human analogy, try to raise the sort of children you'd want to pick your nursing home. So I do think the philosophical and moral questions matter even if humans lose control.

comment by Wei Dai (Wei_Dai) · 2024-03-07T16:09:52.141Z · LW(p) · GW(p)

Thanks for collecting works/discussions in this area and offering your own takes. It's great to see more interest on how to improve AI safety besides keeping human control, and I hope the recent trend continues.

You have several links to Will MacAskill talking about working in this area, but didn't link to the specific comment/shortform [EA(p) · GW(p)], only his overall "quick takes" page.

The important takeaway is that future AI-powered humans might set themselves up for cooperation failure by learning too much too quickly. This would be particularly tragic if it resulted in acausal conflict.

There's too little in this section for me to understand how you arrived at this conclusion/concern. It might benefit from a bit more content or references. (Other sections may also benefit from this, but I'm already more familiar with those topics and so may not have noticed.)

comment by Caspar Oesterheld (Caspar42) · 2024-03-03T18:39:26.241Z · LW(p) · GW(p)

In short, the idea is that there might be a few broad types of “personalities” that AIs tend to fall into depending on their training. These personalities are attractors.

I'd be interested in why one might think this to be true. (I only did a very superficial ctrl+f on Lukas' post -- sorry if that post addresses this question.) I'd think that there are lots of dimensions of variation and that within these, AIs could assume a continuous range of values. (If AI training mostly works by training to imitate human data, then one might imagine that (assuming inner alignment) they'd mostly fall within the range of human variation. But I assume that's not what you mean.)

comment by Kajus · 2024-03-04T12:10:43.718Z · LW(p) · GW(p)

Interesting! Reading this makes me think that there is some kind of tension between “paperclip maximizer” view on AI. Some interventions or risks you mentioned assume that AI will get its attitude from the training data, while the “paperclip maximizer” is an AI with just a goal and with whatever beliefs it will help it to achieve it. I guess the assumptions is that the AI will be much more human in some way.