Edmund's Short form

post by Edmund Nelson (edmund-nelson) · 2022-03-05T02:03:37.694Z · LW · GW · 2 comments

Contents

3 comments

2 comments

Comments sorted by top scores.

comment by Edmund Nelson (edmund-nelson) · 2022-03-05T02:04:01.536Z · LW(p) · GW(p)

At what point is Ai judged to have "superhuman performance"

From what I can tell there are roughly 4 stages of "Ai performance"

Stage 1 : subhuman (this covers a lot of ground) The Ai is unable to perform as well as a top human in the field (such Ai's can still be useful) 

Stage 2: The "superhuman" stage: Ai's outperform humans on normal variations of the task

Stage 3: The "adversarial stage" : Humans find adversarial examples which let them outperform the AI (This is most relevant in games) ex [starcraft 2](

Example of exploitative play allowing human to beat AI (this AI beat then top human Serral in a best of 5)

)

 

Stage 4 the god phase: Even in adversarial examples humans are outperformed by the computer (ex Chess) 


Obviously Ai's can skip stage 3 entirely, and that does happen but I hear conflicting results on stage 3, many people argue we have superhuman results in starcraft 2, but unless there is an Ai more advanced than blizzcon alphastar, it appears we are on stage 3 (humans can reliably beat the AI with anti-AI tactics, but normal play loses). Is stage 3 generally considered "superhuman"? 

Alphastar vs serral link (is there a way to collapse these?)

Replies from: TLW
comment by TLW · 2022-03-06T05:58:29.562Z · LW(p) · GW(p)

Part of this is people analyzing AIs in adversarial contexts through the lens of non-adversarial contexts when they really shouldn't be.

In a non-adversarial context, an AI that beats X 95% of the time when the top human beats X 90% of the time is often considered superhuman. And so you get people calling e.g. AlphaStar superhuman because it beat the top human.

In an adversarial context, where that other 5% is in the statespace matters a lot. (E.g. if it's in a region that can be steered towards by an opponent, that's a problem.)

*****

(Part of this is also that the AI training is far more lenient of low-but-ahead win rates than humans are. A human will often lean towards weaker but more-resistant-to-glass-jaws strategies, especially in tournament settings. (They are trying to beat the tournament, not, strictly speaking, get the highest % of wins.))

*****

I sometimes think that the dual benchmark of 'what's the highest rank person X beats >40%[1] of the time' and 'what's the lowest rank person that beats X >60%[1] of the time' would be more useful in evaluating AI progress.

  1. ^

    Semi-arbitrary numbers, don't read too much into it.