Grad-CAM Implementation in pycaffe

You can find the code discussed in this post in this git repository.

This post discusses how to implement Gradient-weighted Class Activation Mapping (Grad-CAM) approach discussed in the paper Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization.

Grad-CAM is a technique that makes Convolutional Neural Network (CNN) based models more interpretable by visualizing input regions where the model would look at while making a predictions.

Grad-CAM model architecture

I'm not going to go deeper in the paper, for a more detailed explanation please refer the paper.

You can find different implementations of this technique in KerasTorch+Caffe, and Tensorflow.
However, I was not able to find pycaffe implementation of GradCAM in the web. As pycaffe is a commonly used deep learning framework for CNN based classification model development, it would be useful to have a pycaffe implementation as well.

If you are looking for a quick solution to interpret your Caffe classification model, this post is for you!


If you are completely new to Caffe, refer the Caffe official page for installation instructions and some tutorials. As we are going to use python interface to Caffe (pycaffe), make sure you install pycaffe as well. All the required instructions are given in the Caffe web site.


For this implementation I'm using a pretrained image classification model downloaded from the community in Caffe Model Zoo.

For this example, I will use BVLC reference caffenet model which is trained to classify images into 1000 classes. To download the model, go to the folder where you installed Caffe, e.g. C:\Caffe and run

 ./scripts/ models/bvlc_reference_caffenet

Then let's write the script

 #load the model
net = caffe.Net('---path to caffe installation folder---/models/bvlc_reference_caffenet/deploy.prototxt',
                '---path to caffe installation folder---/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',

# load input and preprocess it
transformer ={'data': net.blobs['data'].data.shape})
transformer.set_mean('data', np.load('--path to caffe installation folder--/python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))
transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)

#We reshape the image as we classify only one image

#load the image to the data layer of the model
im ='--path to caffe installation folder--/examples/images/cat.jpg')
net.blobs['data'].data[...] = transformer.preprocess('data', im)

#classify the image
out = net.forward()

#predicted class
print (out['prob'].argmax())

Next we have to calculate the gradient of the predicted class socre w.r.t to the convolution layer of interest. This is the tricky part. Caffe framework provides an inbuilt function


to calculate gradients of the network. However, if you study the documentation of backward() function you would understand that, this method calculates gradients of  loss w.r.t. input layer (or as commonly used in Caffe 'data' layer).

To implement Grad-CAM we need gradients of the layer just before the softmax layer with respect to a convolution layer, preferably the last convolution layer. To achieve this you have to modify the deploy.prototxt file. You just have to remove the softmax layer and add following line just after the model name.

 force_backward: true

Then by using the following code snippet we can derive Grad-CAM

final_layer = "fc8" #output layer whose gradients are being calculated
image_size = (227,227) #input image size
feature_map_shape = (13, 13) #size of the feature map generated by 'conv5'
layer_name = 'conv5' #convolution layer of interest
category_index = out['fc8'].argmax() #-if you want to get the saliency map of predicted class or else you can get saliency map for any interested class by specifying here

#Make the loss value class specific    
label = np.zeros(input_model.blobs[final_layer].shape)
label[0, category_index] = 1    

imdiff = net.backward(diffs= ['data', layer_name], **{input_model.outputs[0]: label}) 
gradients = imdiff[layer_name] #gradients of the loss value/ predicted class score w.r.t conv5 layer

#Normalizing gradients for better visualization
gradients = gradients/(np.sqrt(np.mean(np.square(gradients)))+1e-5)
gradients = gradients[0,:,:,:]

print("Gradients Calculated")

activations = net.blobs[layer_name].data[0, :, :, :] 

#Calculating importance of each activation map
weights = np.mean(gradients, axis=(1, 2))

cam = np.ones(feature_map_shape, dtype=np.float32)

for i, w in enumerate(weights):
    cam += w * activations[i, :, :]    

#Let's visualize Grad-CAM
cam = cv2.resize(cam, image_size)
cam = np.maximum(cam, 0)
heatmap = cam / np.max(cam)
cam = cv2.applyColorMap(np.uint8(255 * heatmap), cv2.COLORMAP_JET) 

#We are going to overlay the saliency map on the image
new_image = cv2.imread(''--path to caffe installation folder--/examples/images/cat.jpg'')
new_image = cv2.resize(new_image, image_size)

cam = np.float32(cam) + np.float32(new_image)
cam = 255 * cam / np.max(cam)
cam = np.uint8(cam)

#Finally saving the result
cv2.imwrite("gradcam.jpg", cam) 

That's it. If everything goes smoothly you will get the following result.

Input Image

Grad-CAM image

Hope this will be helpful. If you need any clarification please feel free to comment below, I'm happy to help you.


  1. Hello. This is very helpful article for studying heat-map.
    I would like to ask some questions.
    1) filter_shape = (13, 13) #size of the filter of 'conv5'
    I don't know how to determine the filter_shape (13,13).
    2) Can I get an .deploy file of caffenet? I want to get an example of .deploy.
    Thanks you

    1. Hello,

      I have used a pre-trained network from Caffe Model Zoo. Please follow this page ( for more information how to use a pre-trained network.
      Once you have the deploy.prototxt file, you can select one of the convolutional layers for GradCAM implementation. In the description of that layer, you will see the kernel size.
      Hope this helps you!

  2. This comment has been removed by the author.

  3. Just wonder what the variable named "input_model" in your last snippet is.

  4. Hi,
    Sorry for the late reply. "input_model" is same as "net" in the previous snippet. I have not clarified it. Thanks for pointing out.

  5. How would one modify this to work with a fine-tuned GoogLeNet model. My challenge is emanating from the presence of several convolution units in the inception module. So what do I replace conv5 with?

    1. Hi,
      Sorry for the late reply.
      I haven't work with GoogLeNet. However, did you try to calculate gradients of the class score with respect to last filter concatenation layer? I believe it would give you a reasonable visualization.

  6. HI, I don't understand it,

    "To achieve this you have to modify the deploy.prototxt file. You just have to remove the softmax layer."

    it means, Do i remove this layer on caffenet_deploy.prototxt

    layer {
    name: "prob"
    type: "Softmax"
    bottom: "fc8"
    top: "prob"

    but, if i remove this layer
    I saw keyerror "prob"

    1. Hi,

      Yes, you are correct. You have to remove the definition of the last layer from deplpy.prototxt file. After you remove it just change the
      category_index = out['prob'].argmax() to
      category_index = out['fc8'].argmax().
      You are getting "prob" keyerror since now you don't have that layer.
      I hadn't updated it in my post. Now it is fixed. Thank you for pointing out it.

  7. Hello Sandareka,
    Thank you very much for this script! It's true that GradCAM is not available in pycaffe, yet it would be a very useful feature to have.
    I have tried to implement your script, and I get good classification in terms of classes (281st class for the image of this cat) and an output image which however differs from yours. It looks like it only has a red overlay? Any ideas on how to fix this and what could be the issue? I used the same script you pasted and same models you mention in this article. Thanks!

    1. Hi,

      If you used the same models as mine, the cause for a competely red overlay could be some issue with normalization. Please debug your code and check if you have got all values in the heatmap very close to one. If that is the case you have got somewhere wrong in normalizing.
      BTW what is the caffe version you are using?

  8. Hi there! Thanks for the post. I have a conceptual question. Let's suppose I have a multi-stream network where say, I have 10 streams - and the architecture is that of CaffeNet. The streams share params up till fc6 layer (so conv1-fc6). And after fc6, there is a concat layer that combines all 10 fc6 activations...and then fc7,fc8 and softmax and this output is one label. Now my goal is to envision 10 heatmaps (one for each input for the stream) using Grad-CAM. I am a little unsure as to whether this technique will work and generate 10 heatmaps for me? It seems to work for single input, single output only. Any thoughts? Thanks!

    1. Hi,

      Sorry for the late reply.

      I believe that you can generate 10 heatmaps. The fundamental idea here is that we are interested in finding impact of each input feature towards the output. Since you can backpropagate gradients of the output with respect to the final convolutional layer of each stream, you should be able to generate a different heatmap for each stream. However, I haven't tried this. You are welcome to try it and please let me know the outcome!

  9. Hi, thanks for your great post!

    I found that the filter_shape = (13, 13) means size of the filter output (feature map) size rather than the filter size. Am I right?


    1. Hi,

      Yes you are correct. Sorry for the wrong wording. I updated the post. Thanks for pointing out it.

  10. hello ,your blog helps me a lot.but when i use gradcam for mnist(Lenet),it shows that "
    I0807 22:31:13.175235 6987 net.cpp:242] This network produces output ip2
    I0807 22:31:13.175241 6987 net.cpp:255] Network initialization done.
    I0807 22:31:13.180763 6987 net.cpp:744] Ignoring source layer mnist
    I0807 22:31:13.181074 6987 net.cpp:744] Ignoring source layer loss
    Gradients Calculated
    ".i tried to solve it by net.cpp,but i have no idea,could you give me some help?

    1. Hi,

      I'm happy to help you. However, can you please tell me what exactly your issue is? Didn't your script create the visualization? The snippet you have provided is not enough to understand the issue.

    2. log is here:"I0810 22:21:44.973631 4538 net.cpp:242] This network produces output ip2
      I0810 22:21:44.973639 4538 net.cpp:255] Network initialization done.
      I0810 22:21:44.978549 4538 net.cpp:744] Ignoring source layer mnist
      I0810 22:21:44.978878 4538 net.cpp:744] Ignoring source layer loss
      Gradients Calculated"
      My script create the visualization,but it shows abnormal that the picture in mnist database i used to visualize ,it just turn to adding a layered red filter,picture turn to red,but it shows that the code identifies which number is.

    3. Hi,

      Please check if you have done normalization correctly. This can happen when you have not done normalization properly. Further, with my experience sometimes GradCAM doesn't provide best visualization from the last convolutional layer. May be you can try another inner layer and check.

      Hope this will help you!

    4. hi,thank you very much,i have solved the problem.i did not add"force_backward: true"in prototxt file,so it can not deconvolution corretly.

    5. Hi,

      Okay, glad you could solve your problem! :)

  11. This comment has been removed by the author.

  12. Thanks a lot for your effort, this code helps me a lot, now it works very well .
    by the way ,I will suggest that
    add this code
    gradients = np.maximum(gradients, 0)
    gradients = imdiff[conv_layer]

    so only the positive gradient can be used, because positive gradient is key feature, not positive , this is part of guided BP idea in Grad CAM paper.

    this is part of the key modification:

    out = model.forward()
    label = np.zeros(net.blobs[last_layer].shape)
    category_index = out[last_layer].argmax()
    label[0, category_index] = 1
    #get gradient image
    imdiff = net.backward(diffs=['data', conv_layer], **{net.outputs[0]: label})
    gradients = imdiff[conv_layer] # gradients of the loss value/ predicted class score w.r.t conv5 layer

    gradients = np.maximum(gradients, 0)
    # Normalizing gradients for better visualization
    gradients = gradients / (np.sqrt(np.mean(np.square(gradients))) + 1e-5)
    gradients = gradients[0, :, :, :]

  13. sorry, wrong word,correct sentence should be :

    so only the positive gradient can be used, because positive gradient is key feature, not negative , this is part of guided BP idea in Grad CAM paper.