Grad-CAM Implementation in pycaffe

Sandareka 9:42 AM Caffe , Convolutional Neural Network (CNN) , Deep Learning , Envision , Grad-CAM , pycaffe , Python , TechnicalEnvision 26 comments

You can find the code discussed in this post in this git repository.

This post discusses how to implement Gradient-weighted Class Activation Mapping (Grad-CAM) approach discussed in the paper Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization.

Grad-CAM is a technique that makes Convolutional Neural Network (CNN) based models more interpretable by visualizing input regions where the model would look at while making a predictions.

Grad-CAM model architecture

I'm not going to go deeper in the paper, for a more detailed explanation please refer the paper.

You can find different implementations of this technique in Keras, Torch+Caffe, and Tensorflow.
However, I was not able to find pycaffe implementation of GradCAM in the web. As pycaffe is a commonly used deep learning framework for CNN based classification model development, it would be useful to have a pycaffe implementation as well.

If you are looking for a quick solution to interpret your Caffe classification model, this post is for you!

Install

If you are completely new to Caffe, refer the Caffe official page for installation instructions and some tutorials. As we are going to use python interface to Caffe (pycaffe), make sure you install pycaffe as well. All the required instructions are given in the Caffe web site.

Implementation

For this implementation I'm using a pretrained image classification model downloaded from the community in Caffe Model Zoo.

For this example, I will use BVLC reference caffenet model which is trained to classify images into 1000 classes. To download the model, go to the folder where you installed Caffe, e.g. C:\Caffe and run

 ./scripts/download_model_binary.py models/bvlc_reference_caffenet
./data/ilsvrc12/get_ilsvrc_aux.sh

Then let's write the gradCAM.py script

 #load the model
net = caffe.Net('---path to caffe installation folder---/models/bvlc_reference_caffenet/deploy.prototxt',
                '---path to caffe installation folder---/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
                caffe.TEST)

# load input and preprocess it
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_mean('data', np.load('--path to caffe installation folder--/python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))
transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)

#We reshape the image as we classify only one image
net.blobs['data'].reshape(1,3,227,227)

#load the image to the data layer of the model
im = caffe.io.load_image('--path to caffe installation folder--/examples/images/cat.jpg')
net.blobs['data'].data[...] = transformer.preprocess('data', im)

#classify the image
out = net.forward()

#predicted class
print (out['prob'].argmax())

Next we have to calculate the gradient of the predicted class socre w.r.t to the convolution layer of interest. This is the tricky part. Caffe framework provides an inbuilt function

 net.backward()

to calculate gradients of the network. However, if you study the documentation of backward() function you would understand that, this method calculates gradients of loss w.r.t. input layer (or as commonly used in Caffe 'data' layer).

To implement Grad-CAM we need gradients of the layer just before the softmax layer with respect to a convolution layer, preferably the last convolution layer. To achieve this you have to modify the deploy.prototxt file. You just have to remove the softmax layer and add following line just after the model name.

 force_backward: true

Then by using the following code snippet we can derive Grad-CAM


final_layer = "fc8" #output layer whose gradients are being calculated
image_size = (227,227) #input image size
feature_map_shape = (13, 13) #size of the feature map generated by 'conv5'
layer_name = 'conv5' #convolution layer of interest
category_index = out['fc8'].argmax() #-if you want to get the saliency map of predicted class or else you can get saliency map for any interested class by specifying here

#Make the loss value class specific    
label = np.zeros(input_model.blobs[final_layer].shape)
label[0, category_index] = 1    

imdiff = net.backward(diffs= ['data', layer_name], **{input_model.outputs[0]: label}) 
gradients = imdiff[layer_name] #gradients of the loss value/ predicted class score w.r.t conv5 layer

#Normalizing gradients for better visualization
gradients = gradients/(np.sqrt(np.mean(np.square(gradients)))+1e-5)
gradients = gradients[0,:,:,:]

print("Gradients Calculated")

activations = net.blobs[layer_name].data[0, :, :, :] 

#Calculating importance of each activation map
weights = np.mean(gradients, axis=(1, 2))

cam = np.ones(feature_map_shape, dtype=np.float32)

for i, w in enumerate(weights):
    cam += w * activations[i, :, :]    

#Let's visualize Grad-CAM
cam = cv2.resize(cam, image_size)
cam = np.maximum(cam, 0)
heatmap = cam / np.max(cam)
cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET) 

#We are going to overlay the saliency map on the image
new_image = cv2.imread(''--path to caffe installation folder--/examples/images/cat.jpg'')
new_image = cv2.resize(new_image, image_size)

cam = np.float32(cam) + np.float32(new_image)
cam = 255 * cam / np.max(cam)
cam = np.uint8(cam)

#Finally saving the result
cv2.imwrite("gradcam.jpg", cam)

That's it. If everything goes smoothly you will get the following result.

Input Image

Grad-CAM image

Hope this will be helpful. If you need any clarification please feel free to comment below, I'm happy to help you.

26 comments :

UnknownSeptember 5, 2017 at 5:34 AM
Hello. This is very helpful article for studying heat-map.
I would like to ask some questions.
1) filter_shape = (13, 13) #size of the filter of 'conv5'
I don't know how to determine the filter_shape (13,13).
2) Can I get an .deploy file of caffenet? I want to get an example of .deploy.
Thanks you
ReplyDelete
Replies
Bardia DoostiOctober 21, 2017 at 2:09 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownDecember 11, 2017 at 1:27 AM
Just wonder what the variable named "input_model" in your last snippet is.
ReplyDelete
Replies
SandarekaJanuary 1, 2018 at 4:35 AM
Hi,
Sorry for the late reply. "input_model" is same as "net" in the previous snippet. I have not clarified it. Thanks for pointing out.
ReplyDelete
Replies
UnknownJanuary 24, 2018 at 5:01 AM
How would one modify this to work with a fine-tuned GoogLeNet model. My challenge is emanating from the presence of several convolution units in the inception module. So what do I replace conv5 with?
ReplyDelete
Replies
UnknownMarch 12, 2018 at 9:34 PM
HI, I don't understand it,

"To achieve this you have to modify the deploy.prototxt file. You just have to remove the softmax layer."

it means, Do i remove this layer on caffenet_deploy.prototxt

layer {
name: "prob"
type: "Softmax"
bottom: "fc8"
top: "prob"
}

but, if i remove this layer
I saw keyerror "prob"
ReplyDelete
Replies
AnonymousApril 3, 2018 at 6:40 AM
Hello Sandareka,
Thank you very much for this script! It's true that GradCAM is not available in pycaffe, yet it would be a very useful feature to have.
I have tried to implement your script, and I get good classification in terms of classes (281st class for the image of this cat) and an output image which however differs from yours. It looks like it only has a red overlay? Any ideas on how to fix this and what could be the issue? I used the same script you pasted and same models you mention in this article. Thanks!
ReplyDelete
Replies
UniApril 22, 2018 at 1:33 PM
Hi there! Thanks for the post. I have a conceptual question. Let's suppose I have a multi-stream network where say, I have 10 streams - and the architecture is that of CaffeNet. The streams share params up till fc6 layer (so conv1-fc6). And after fc6, there is a concat layer that combines all 10 fc6 activations...and then fc7,fc8 and softmax and this output is one label. Now my goal is to envision 10 heatmaps (one for each input for the stream) using Grad-CAM. I am a little unsure as to whether this technique will work and generate 10 heatmaps for me? It seems to work for single input, single output only. Any thoughts? Thanks!
ReplyDelete
Replies
UnknownJune 22, 2018 at 12:11 PM
Hi, thanks for your great post!

I found that the filter_shape = (13, 13) means size of the filter output (feature map) size rather than the filter size. Am I right?

Thanks~
ReplyDelete
Replies
UnknownAugust 7, 2018 at 2:18 AM
hello ,your blog helps me a lot.but when i use gradcam for mnist(Lenet),it shows that "
I0807 22:31:13.175235 6987 net.cpp:242] This network produces output ip2
I0807 22:31:13.175241 6987 net.cpp:255] Network initialization done.
I0807 22:31:13.180763 6987 net.cpp:744] Ignoring source layer mnist
I0807 22:31:13.181074 6987 net.cpp:744] Ignoring source layer loss
0
Gradients Calculated
".i tried to solve it by net.cpp,but i have no idea,could you give me some help?
ReplyDelete
Replies
UnknownAugust 15, 2018 at 12:15 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownOctober 31, 2018 at 8:36 AM
Thanks a lot for your effort, this code helps me a lot, now it works very well .
by the way ,I will suggest that
add this code
gradients = np.maximum(gradients, 0)
after
gradients = imdiff[conv_layer]

so only the positive gradient can be used, because positive gradient is key feature, not positive , this is part of guided BP idea in Grad CAM paper.

this is part of the key modification:

out = model.forward()
label = np.zeros(net.blobs[last_layer].shape)
category_index = out[last_layer].argmax()
label[0, category_index] = 1
#get gradient image
imdiff = net.backward(diffs=['data', conv_layer], **{net.outputs[0]: label})
gradients = imdiff[conv_layer] # gradients of the loss value/ predicted class score w.r.t conv5 layer

gradients = np.maximum(gradients, 0)
# Normalizing gradients for better visualization
gradients = gradients / (np.sqrt(np.mean(np.square(gradients))) + 1e-5)
gradients = gradients[0, :, :, :]
ReplyDelete
Replies
UnknownOctober 31, 2018 at 8:38 AM
sorry, wrong word,correct sentence should be :

so only the positive gradient can be used, because positive gradient is key feature, not negative , this is part of guided BP idea in Grad CAM paper.
ReplyDelete
Replies
AnonymousMay 26, 2020 at 11:10 PM
Thank you:)
ReplyDelete
Replies

Add comment

Subscribe to: Post Comments ( Atom )

Grad-CAM Implementation in pycaffe

Install

Implementation

26 comments :

Categories

About Me

Followers

Subscribe

Popular Posts

Blog Archive

Labels