Adversarial training: deep learning book

Friday, July 12, 2019

Adversarial training: deep learning book

In the Deep Learning book of Ian Goodfellow, p. 261 it is shown how to build an "adversarial example" by adding to an image $x$ another image $x_{adversarial}$ build as epsilon times and image (same size) which is the sign of the gradient of the cost function $J$ with respect to $x$ (for the learned weights \theta of the network, is I understand correctly, i.e. fixed theta). I recall that the $x_{adversarial}$ looks like a panda to a human; human cannot see the small noise introduced in the image, but the DNN is fooled by it, and classifies it as a gibboon.

In mathematical form: $x_{adverserial} = x+\epsilon\nabla_x J(\theta,x,y)$, where $x$ is the panda image, $y$ is the ground truth label and $\theta$ are the DNN's wheights which I assume are the trained weights wich will not be touched anymore.

However, I didnt read (dont think ived missed ?) WHY this works, i.e., can someone give an "intuitive" explanation of why doing this procedure makes the DNN classify the panda as a gibbon? It is not intuitive to me why this gradient image results in this gibbon being found (gibbon or any other image).

The book is available here: https://www.deeplearningbook.org/contents/regularization.html p. 265 here (261 in my physical book)

side note: they take the gradient w.r.t. $x$ and not w.r.t $\theta$ as would be the case for the training. And also if they were taking it w.r.t $\theta$ the shame would not be the same. But I would like to know why it makes sense to take is w.r.t x and how that can fool the network into "thinking" that this is another image.

Notes

Friday, July 12, 2019

Adversarial training: deep learning book

No comments:

Post a Comment

digital communications - Understanding the Matched Filter