Explaining Graph Neural Networks


Graph Neural Networks are an up-and-coming class of neural networks that operate on graphs and can therefore deal with connected, highly complex data. As explaining neural networks becomes more and more important, we investigate different ways to explain graph neural networks and contrast gradient-based explanations with the interpretability by design approach KEdge. We extend KEdge, to work with probability distributions different from HardKuma. Our goal is to test the performance of each method to judge which one works best under given circum- stances. For this, we extend the notion of fidelity from hard attribution weights to soft attribution weights and use the resulting metric to evaluate the explanations generated by KEdge, as well as by the gradient-based techniques. We also compare the predictive performance of models that use KEdge with different distributions. Our experiments are run on the Cora, SightSeer, Pubmed, and MUTAG datasets. We find that KEdge outperforms the gradient based attribution techniques on graph classification problems and that it should be used with the HardNormal, HardKuma, or HardLaplace distributions, depending on if the top priority is model performance or attribution quality. To compare different metrics of judging attributions in the text domain, we visualize attribution weights generated by different models and find, that metrics which compare model attributions to human explanations lead to bad attribution weights.

Bachelor Thesis


In this bachelor thesis, we explore and evaluate different methods of explaining Graph Neural Networks (GNNs). Graph Neural Networks are an emerging class of neural networks, that take graphs as their input data. This is especially useful since graphs are highly flexible and powerful data structures, that can therefore express a set of different datapoints with complex relationships between them. The motivation for developing graph neural networks comes from the overwhelming success of convolutional neural networks, which can be seen as a special case of GNNs, operating on pictures by exploiting neighborhood information, which can also be expressed as a graph. Today graph neural networks are used in a wide array of domains, like the prediction of molecular properties in chemistry, drug discovery, or even diagnosis in medicine, to model the spread of disease, in recommendation systems, or natural language processing.

But why would one want to explain these networks? Methods for explaining neural models are used to perform a wide amount of tasks. The first one is to debug the model and increase performance, as explanation methods can uncover model bias or spurious correlations in the training data. These are then used to clean up or expand the training data or to adjust the model class, to archive better performance and generalization.

$A$PositiveEven though the Icelandic scenery is incredibly stunning, the story can’t keep up, and therefore the overall experience is boring.
$B$NegativeEven though the Icelandic scenery is incredibly stunning, the story can’t keep up, and therefore the overall experience is boring.
$C$NegativeEven though the Icelandic scenery is incredibly stunning, the story can’t keep up, and therefore the overall experience is boring.

For example, if we want to categorize reviews into positive and negative ones, we are also interested in exactly why our model decides that a given review is positive or negative. Using this information we can more accurately judge the models’ performance, by checking if its predictions are correct for the right reasons. The example explanations in the table above reveal that model $B$ is correct for the wrong reason, while model $C$ is correct for the right reason. Therefore, model $C$ should be deployed over model $B$ since we can expect $C$ to generalize better to new, unseen data.

A second application area of explanations is to assess the suitability of a model for use in the real world. This is especially important in high-stakes environments, such as medicine or law enforcement, where graph neural networks are used. Therefore, explainability is also part of the approval process by a regulatory authority, like the European Union, or in some companies. Another way explanation techniques are useful is by hinting on what to change in the input, to receive a different model output. This is useful for example in loan approval if the client wants to receive information on what factors to change to be approved.

One distinguishes two forms of explanations: global and local ones. While global explanations are ways of explaining the model as a whole, it is often not feasible to construct such global explanations, especially when using a model with a lot of parameters, since it’s just too complex to be understood as a whole. Therefore, we want to focus on local explanations. These don’t attempt to explain the whole model, but just a single decision of the model given a certain input. The explanations in the table above are local ones.

Now it begs the question, how one can explain the decisions of a graph neural network. To answer this question we will lay out the relevant techniques to generate attribution weights, as well as expand on them. Attribution weights are ways of explaining neural models by associating a weight with different parts, or tokens, of the models’ input. These tokens could be pixels in a picture, words in a text, or nodes and/or edges in a graph. The parts with high weight are seen as more important for the models’ decision than those with low weights. If all generated weights are zero or one, the technique is called a hard attribution technique. These mark relevant parts of the input, as is the case in the table above. When a range of real numbers is allowed as weights, the attributions are called soft. We will focus on soft attribution techniques, as these provide a relation of importance on the inputs’ tokens.

The second question that arises is, which technique should one use to explain GNNs and how to judge if one technique is better than another? To answer these questions, we first establish and explain the notion of graph neural networks, as well as different architectures. Then we introduce some gradient-based attribution techniques and the interpretability by design approach of KEdge. KEdge was introduced by Rathee et al. in 2021. It works by sampling a mask for the edges of a graph via an approximation of the Bernoulli distribution. This mask can then be used to generate attribution weights. In the original paper, this approximation is based on the Kumaraswamy, or Kuma, distribution. In our third chapter, we define some probability distributions to construct different approximations of the Bernoulli distribution, that we can use with KEdge. We also talk about how to obtain node-level attribution weights from KEdge. Then we introduce some metrics to measure the performance of the different attribution techniques, and in particular, we extend the notion of fidelity to soft attribution techniques by introducing integrated fidelity.

In the main part of this thesis, we conduct three experiments. The first two, to evaluate and compare the attribution techniques, as well as to see, what effects KEdge has on a model’s performance. Here, we compare the accuracy of different models with and without KEdge, to see if there is a noticeable difference, depending on which underlying probability distribution we used. We also compare the integrated fidelity values of all the attribution techniques we introduced before. This is done on the node classification datasets Pubmed, Cora, and CiteSeer and the graph classification dataset MUTAG. In the last experiment, we use our methods on a text dataset of movie reviews, to be able to visualize attribution weights and compare different metrics of evaluating attribution weights.

For more information, see the full thesis.


(Rathee et al., 2021) Mandeep Rathee et al. ‘‘Learned Sparsification for Interpretable Graph Neural Networks’’. In: arXiv: 2106.12920, 2021.

Tobias Christian Nauen
Tobias Christian Nauen
PhD Student

My research interests include efficiency of machine learning models, multimodal learning, and transformer models.