Black-box Attack in Partial Auxiliary Information Setting
We designed a transfer-based black-box attack to generate adversarial examples in a partial auxiliary information scenario that the training dataset of the local models partially overlaps with the training dataset of the black-box target model. In a traditional transfer-based attack, the attacker perturbs a clean image to be misclassified by the local model and then sends the perturbed image to the target model to see whether it transfers. Most transfer-based attacks assume that the local and target models are trained with overlapping data samples and similar architecture, which is impractical in reality. To relax this assumption, we adopted a self-supervised local model to extract semantic features from the unlabeled dataset. I combined it with the projected gradient descent method to generate adversarial examples.
I started from using an auto-encoder as the local model, and our method worked well on the MNIST dataset. Compared to the naive black-box attack, with a single auto- encoder as the auxiliary model, the number of queries used dropped by as high as 86% (+-3%). However, the results on the slightly more complex CIFAR10 dataset were not encouraging. I compared the distance between the features generated by the self-supervised local model and the supervised target model. It turns out that the semantic features differ significantly even if the two models have the same architecture. Therefore, I tried self-supervised algorithms such as pre-trained deep auto-encoder, ensemble masked auto-encoders, Simple Framework for Contrastive Learning of Visual Representations, and Momentum Contrast for Unsupervised Visual Representation Learning, wishing to reduce the gap. Even with these tricks, the query reduction is only around 11% (+-5%) and may even be smaller on larger datasets such as ImageNet.
- Fnu Suya, Jianfeng Chi, David Evans, Yuan Tian. Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries. In USENIX Security Symposium, 2020
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In IEEE International Conference on Machine Learningand Applications, 2020.
- Kaiming He and Haoqi Fan and Yuxin Wu, and Saining Xie and Ross Girshick. Momentum Contrast for Unsupervised Visual Representation Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020