The top proof feature corresponding to DNNs trained using different methods rely on different input features. 2021). In this section, we interpret proof features obtained with SuPFEx and use these interpretations to qualitatively check whether the dissimilarities are also evident in the invariants captured by the different proofs of the same robustness property on standard and robustly trained networks. We also study the effect of certified robust training methods like CROWN-IBP (Zhang et al., 2020), empirically robust training methods like PGD (Madry et al., 2018) and training methods that combine both adversarial and certified training like COLT (Balunovic & Vechev, 2020) on the proof features. For a local input region φ, we say that a robustness proof is semantically meaningful if it focuses on the relevant features of the output class for images contained inside φ and not on the spurious features. In the case of MNIST or CIFAR-10 images, spurious features are the pixels that form a part of the background of the image, whereas important features are the pixels that are a part of the actual object being identified by the network. Gradient map of the extracted proof features w.r.t. to the input region φ gives us an idea of the input pixels that the network focuses on. We obtain the gradient maps by calculating the mean gradient over 100 uniformly drawn samples from φ as described in Section 4.3. As done in (Tsipras et al., 2019), to avoid introducing any inherent bias in proof feature visualization, no preprocessing (other than scaling and clipping for visualization) is applied to the gradients obtained for each individual sample. In Fig. 2, we compare the gradient maps corresponding to the top proof feature (the one having the highest prior-

The top proof feature corresponding to DNNs trained using different methods rely on different input features. 2021). In this section, we interpret proof features obtained with SuPFEx and use these interpretations to qualitatively check whether the dissimilarities are also evident in the invariants captured by the different proofs of the same robustness property on standard and robustly trained networks. We also study the effect of certified robust training methods like CROWN-IBP (Zhang et al., 2020), empirically robust training methods like PGD (Madry et al., 2018) and training methods that combine both adversarial and certified training like COLT (Balunovic & Vechev, 2020) on the proof features. For a local input region φ, we say that a robustness proof is semantically meaningful if it focuses on the relevant features of the output class for images contained inside φ and not on the spurious features. In the case of MNIST or CIFAR-10 images, spurious features are the pixels that form a part of the background of the image, whereas important features are the pixels that are a part of the actual object being identified by the network. Gradient map of the extracted proof features w.r.t. to the input region φ gives us an idea of the input pixels that the network focuses on. We obtain the gradient maps by calculating the mean gradient over 100 uniformly drawn samples from φ as described in Section 4.3. As done in (Tsipras et al., 2019), to avoid introducing any inherent bias in proof feature visualization, no preprocessing (other than scaling and clipping for visualization) is applied to the gradients obtained for each individual sample. In Fig. 2, we compare the gradient maps corresponding to the top proof feature (the one having the highest prior-

Source publication
Preprint
Full-text available
In recent years numerous methods have been developed to formally verify the robustness of deep neural networks (DNNs). Though the proposed techniques are effective in providing mathematical guarantees about the DNNs behavior, it is not clear whether the proofs generated by these methods are human-interpretable. In this paper, we bridge this gap by...

Contexts in source publication

Context 1
... Fig. 2, we compare the gradient maps corresponding to the top proof feature (the one having the highest prior-ity P ub (F ni )) on networks from Table 1 on representative images of different output classes in the MNIST and CI-FAR10 test sets. The experiments leads us to interesting observations -even if some property is verified for both the ...
Context 2
... Gradient maps generated on MNIST networks (b) Gradient maps generated on CIFAR-10 networks Figure 4. Additional plots for the top proof feature visualization (in addition to Fig. 2) -Visualization of gradient map of top proof feature (having highest priority) generated for networks trained with different training methods. It is evident that the top proof feature corresponding to the standard network highlights both relevant and spurious input features. In contrast, the top proof feature of the provably robust ...

Similar publications

Article
Full-text available
We analyze the spaces of images encoded by generative neural networks of the BigGAN architecture. We find that generic multiplicative perturbations of neural network parameters away from the photo-realistic point often lead to networks generating images which appear as “artistic renditions” of the corresponding objects. This demonstrates an emergen...