Hierarchical neural-net interpretations (ACD)

Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Official code for Hierarchical interpretations for neural network predictions (ICLR 2019 pdf).

DocumentationDemo notebooks

Note: this repo is actively maintained. For any questions please file an issue.


  • installation: pip install acd (or clone and run python setup.py install)
  • examples: the reproduce_figs folder has notebooks with many demos
  • src: the acd folder contains the source for the method implementation
  • allows for different types of interpretations by changing hyperparameters (explained in examples)
  • all required data/models/code for reproducing are included in the dsets folder
Inspecting NLP sentiment models Detecting adversarial examples Analyzing imagenet models

notes on using ACD on your own data

  • the current CD implementation often works out-of-the box, especially for networks built on common layers, such as alexnet/vgg/resnet. However, if you have custom layers or layers not accessible in net.modules(), you may need to write a custom function to iterate through some layers of your network (for examples see cd.py).
  • to use baselines such build-up and occlusion, replace the pred_ims function by a function, which gets predictions from your model given a batch of examples.

related work

  • CDEP (ICML 2020 pdf, github) – penalizes CD / ACD scores during training to make models generalize better
  • TRIM (ICLR 2020 workshop pdf, github) – using simple reparameterizations, allows for calculating disentangled importances to transformations of the input (e.g. assigning importances to different frequencies)
  • PDR framework (PNAS 2019 pdf) – an overarching framewwork for guiding and framing interpretable machine learning
  • DAC (arXiv 2019 pdf, github) – finds disentangled interpretations for random forests
  • Baseline interpretability methods – the file scores/score_funcs.py also contains simple pytorch implementations of and the simple interpration technique gradient * input


  • feel free to use/share this code openly
  • if you find this code useful for your research, please cite the following:

   title={Hierarchical interpretations for neural network predictions},
   author={Chandan Singh and W. James Murdoch and Bin Yu},
   booktitle={International Conference on Learning Representations},


GitHub - csinva/hierarchical-dnn-interpretations: Using / reproducing ACD from the paper “Hierarchical interpretations for neural network predictions” ? (ICLR 2019)
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" ? (ICLR 2019) - GitHub - csinva/hierarchical-dnn-interpretations: Using / reproducing ...