Researchers have shown that deep neural network classifiers fail to produce correct output when given specific samples with minor feature perturbations called adversarial examples.
Generally, adversarial examples were created intentionally in order to fool deep neural networks. However, in recent work, researchers from UC Berkeley, the University of Washington and the University of Chicago have shown that natural adversarial examples exist in large numbers.
They curated a dataset of real-world, natural adversarial examples i.e. samples which cause classifiers accuracy to significantly drop. The novel dataset was called ImageNet-A and it contains 7500 images – natural adversarial examples.
Researchers used quite a simple process of extracting real-world, unmodified adversarial examples. They downloaded a large set of images related to classes in ImageNet and fed them into a ResNet-50 classifier. Then, they discarded the correctly classified samples, and from the wrongly classified images, they manually picked high-quality “natural adversarial examples”.
The dataset that was collected, was used to test popular classifiers’ accuracy as a test set of “adversarial examples”. Researchers showed that there is a significant drop inaccuracy of many of the classifiers. For example, DenseNet-121 obtains merely 2% classification accuracy (showing a drop of 90%) on the novel dataset ImageNet-A.
In their paper, researchers examine how training techniques for improving robustness affect the accuracy of the new dataset and they propose simple architectural changes to account for some model robustness problems. They mention that their new dataset is a strong benchmark and a challenge for researchers to develop more robust classifiers.
The dataset was open-sourced and can be found here. More details about the data collection process and the evaluations of classifiers can be found in the paper available on arxiv.