一、AI安全
(对抗性训练)Topology-Preserving Adversarial Training
标题: 保持拓扑的对抗性训练
链接: https://arxiv.org/abs/2311.17607
作者: Xiaoyue Mi,Fan Tang,Yepeng Weng,Danding Wang,Juan Cao,Sheng Tang,Peng Li,Yang Liu
摘要: 尽管在提高神经网络的鲁棒性方面是有效的,但对抗性训练遭受了自然准确性降级问题,即,天然样品的准确性已大大降低。在这项研究中,我们揭示了自然精度退化是高度相关的自然样本拓扑结构的表示空间的定量和定性实验的破坏。基于这一观察结果,我们提出了拓扑保留对抗训练(TRAIN),通过在对抗训练期间仅在自然样本上训练的标准模型中保留自然样本的拓扑结构来缓解这个问题。作为一种额外的正则化,我们的方法可以很容易地以即插即用的方式与各种流行的对抗训练算法相结合,利用双方的优势。在CIFAR-10、CIFAR-100和Tiny ImageNet上进行的大量实验表明,我们提出的方法在大多数情况下都能在各种强基线上实现一致且显著的改进。具体来说,在没有额外数据的情况下,我们提出的方法在自然准确度上提高了8.78%,在鲁棒准确度上提高了4.50%。
摘要: Despite the effectiveness in improving the robustness of neural networks, adversarial training has suffered from the natural accuracy degradation problem, i.e., accuracy on natural samples has reduced significantly. In this study, we reveal that natural accuracy degradation is highly related to the disruption of the natural sample topology in the representation space by quantitative and qualitative experiments. Based on this observation, we propose Topology-pReserving Adversarial traINing (TRAIN) to alleviate the problem by preserving the topology structure of natural samples from a standard model trained only on natural samples during adversarial training. As an additional regularization, our method can easily be combined with various popular adversarial training algorithms in a plug-and–play manner, taking advantage of both sides. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny ImageNet show that our proposed method achieves consistent and significant improvements over various strong baselines in most cases. Specifically, without additional data, our proposed method achieves up to 8.78% improvement in natural accuracy and 4.50% improvement in robust accuracy.
(对抗性攻击)Group–wise Sparse and Explainable Adversarial Attacks
标题: 群组稀疏和可解释的对抗性攻击
链接: https://arxiv.org/abs/2311.17434
作者: Shpresim Sadiku,Moritz Wagner,Sebastian Pokutta
摘要: 稀疏对抗攻击通过最小的像素扰动来欺骗深度神经网络(DNN),通常由
ℓ
0