Why study perturbative adversarial attacks ?

post by Yuxi_Liu · 2019-08-29T20:15:59.364Z · LW · GW · 1 comments

Contents

  Summary of paper
  My speculations
None
1 comment

This post summarizes and comments on Motivating the Rules of the Game for Adversarial Example Research

Summary of paper

Despite the amount of recent work done, human-inperceptible perturbation adversarial attacks (Example: One Pixel Attack) are not as useful as the researchers may think, for two reasons:

  1. They are not based on realistic attacks against these AI systems.

we were unable to find a compelling example that required indistinguishability.

... the best papers on defending against adversarial examples carefully articulate and motivate a realistic attack model, ideally inspired by actual attacks against a real system.

There are much better attack methods where a real adversary could use:

  1. They are not very fruitful for improving robustness.

In practice, the best solutions to the l_p problem are essentially to optimize the metric directly and these solutions seem not to generalize to other threat models.

My speculations

If so much work has been done for such dubious gains, I have two bitter questions:

  1. Why did they work on the perturbation attacks so much?
  2. Why are these works so fun to read?

The second question partially answers the first: because they are fun. But that can't be the only explanation. I think the other explanation is that perturbational adversarial examples are easy, because they can be defined in one short equation, and trained without domain knowledge (just like the neural networks themselves).

As for why these works are so fun to read, I think it's because they are extremely humorous, and confirms comforting beliefs about human superiority. The humor comes from the contrast between tiny perturbations in input and big perturbations in output, between incomprehensible attacks and comprehensible results, between the strange behavior of neural networks and the familiar behavior of humans.


Gilmer, Justin, Ryan P. Adams, Ian Goodfellow, David Andersen, and George E. Dahl. “Motivating the Rules of the Game for Adversarial Example Research.” ArXiv Preprint ArXiv:1807.06732, 2018.

1 comments

Comments sorted by top scores.

comment by Matthew Barnett (matthew-barnett) · 2019-08-30T18:17:03.503Z · LW(p) · GW(p)
Test-set attack. Just keep feeding it natural inputs until it gets an error. As long as the system is not error-free this will succeed

Recently, this is what some researchers have tried. The paper is Natural Adversarial Examples, and the images are pretty interesting.