PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

A feature-based model explanation denotes how much each input feature contributes to a model's output for a given data point. As the number of proposed explanation functions grows, we lack quantitative evaluation criteria to help practitioners know when to use which explanation function. This paper proposes quantitative evaluation criteria for feature-based explanations: low sensitivity, high faithfulness, and low complexity. We devise a framework for aggregating explanation functions. We develop a procedure for learning an aggregate explanation function with lower complexity and then derive a new aggregate Shapley value explanation function that minimizes sensitivity.
Evaluating and Aggregating Feature-based Model Explanations
Umang Bhatt1,2,Adrian Weller1,3and Jos´
e M. F. Moura2
1University of Cambridge
2Carnegie Mellon University
3The Alan Turing Institute
{usb20, aw665}@cam.ac.uk, moura@ece.cmu.edu
Abstract
A feature-based model explanation denotes how
much each input feature contributes to a model’s
output for a given data point. As the number
of proposed explanation functions grows, we lack
quantitative evaluation criteria to help practition-
ers know when to use which explanation function.
This paper proposes quantitative evaluation crite-
ria for feature-based explanations: low sensitiv-
ity, high faithfulness, and low complexity. We de-
vise a framework for aggregating explanation func-
tions. We develop a procedure for learning an ag-
gregate explanation function with lower complex-
ity and then derive a new aggregate Shapley value
explanation function that minimizes sensitivity.
1 Introduction
There has been great interest in understanding black-box ma-
chine learning models via post-hoc explanations. Much of
this work has focused on feature-level importance scores for
how much a given input feature contributes to a model’s out-
put. These techniques are popular amongst machine learn-
ing scientists who want to sanity check a model before de-
ploying it in the real world [Bhatt et al., 2020]. Many
feature-based explanation functions are gradient-based tech-
niques that analyze the gradient flow through a model to de-
termine salient input features [Shrikumar et al., 2017; Sun-
dararajan et al., 2017]. Other explanation functions perturb
input values to a reference output and measure the change
in the model’s output [ˇ
Strumbelj and Kononenko, 2014;
Lundberg and Lee, 2017].
With many candidate explanation functions, machine
learning practitioners find it difficult to pick which explana-
tion function best captures how a model reaches a specific
output for a given input. Though there has been work in
qualitatively evaluating feature-based explanation functions
on human subjects [Lage et al., 2019], there has been little
exploration into formalizing quantitative techniques for eval-
uating model explanations. Recent work has created auxiliary
tasks to test if attribution is assigned to relevant inputs [Yang
Contact Author
and Kim, 2019]and has developed tools to verify if the fea-
tures important to an explanation function are relevant to the
model itself [Camburu et al., 2019].
Borrowing from the humanities, we motivate three cri-
teria for assessing a feature-based explanation: sensitivity,
faithfulness, and complexity. Philosophy of science research
has advocated for explanations that vary proportionally with
changes in the system being explained [Lipton, 2003]; as
such, explanation functions should be insensitive to perturba-
tions in the model inputs, especially if the model output does
not change. Capturing relevancy faithfully is helpful in an ex-
planation [Ruben, 2015]. Since humans cannot process a lot
of information at once, some have argued for minimal model
explanations that contain only relevant and representative fea-
tures [Batterman and Rice, 2014]; therefore, an explanations
should not be complex (i.e., use few features).
In this paper, we first define these three distinct criteria:
low sensitivity, high faithfulness, and low complexity. With
many explanation function choices, we then propose methods
for learning an aggregate explanation function that combines
explanation functions. If we want to find the simplest expla-
nation from a set of explanations, then we can aggregate ex-
planations to minimize the complexity of the resulting expla-
nation. If we want to learn a smoother explanation function
that varies slowly as inputs are perturbed, we can leverage
an aggregation scheme that learns a less sensitive explanation
function. To the best of our knowledge, we are the first to
rigorously explore aggregation of various explanations, while
placing explanation evaluation on an objective footing. To
that end, we highlight the contributions of this paper:
We describe three desirable criteria for feature-based ex-
planation functions: low sensitivity, high faithfulness,
and low complexity.
We develop an aggregation framework for combining
explanation functions.
We create two techniques that reduce explanation com-
plexity by aggregating explanation functions.
We derive an approximation for Shapley-value explana-
tions by aggregating explanations from a point’s near-
est neighbors, minimizing explanation sensitivity and re-
sembling how humans reason in medical settings.
arXiv:2005.00631v1 [cs.LG] 1 May 2020
2 Preliminaries
Restricting to supervised classification settings, let fbe a
black box predictor that maps an input xRdto an output
f(x)∈ Y. An explanation function gfrom a family of expla-
nation functions, G, takes in a predictor fand a point of inter-
est xand returns importance scores g(f,x) = φxRdfor
all features, where g(f,x)i=φx,i (simplified to φiin con-
text) is the importance of (or attribution for) feature xiof x.
By gj, we refer to a particular explanation function, usually
from a set of explanation functions Gm={g1,g2,...,gm}.
We denote D:Rd×Rd7→ R0to be a distance met-
ric over explanations, while ρ:Rd×Rd7→ R0denotes
a distance metric over the inputs. An evaluation criterion µ
takes in a predictor f, explanation function g, and input x,
and outputs a scalar: µ(f,g;x).D={(xi, y i)}n
i=1 refers to
a dataset of input-output pairs, and Dxdenotes all xiin D.
3 Evaluating Explanations
With the number of techniques to develop feature level ex-
planations growing in the explainability literature, picking
which explanation function gto use can be difficult. In order
to study the aggregation of explanation functions, we define
three desiderata of an explanation function g.
3.1 Desideratum: Low Sensitivity
We want to ensure that, if inputs are near each other and their
model outputs are similar, then their explanations should be
close to each other. Assuming fis differentiable, we desire
an explanation function gto have low sensitivity in the re-
gion around a point of interest x, implying local smoothness
of g. While [Melis and Jaakkola, 2018]codified the property,
[Ghorbani et al., 2019]empirically tested explanation func-
tion sensitivity. We follow the convention of the former and
define max sensitivity and average sensitivity in the neighbor-
hood of a point of interest x.
Let Nr={z∈ Dx|ρ(x,z)r, f(x) = f(z)}be a
neighborhood of datapoints within a radius rof x.
Definition 1 (Max Sensitivity).Given a predictor f, an ex-
planation function g, distance metrics Dand ρ, a radius r,
and a point x, we define the max sensitivity of gat xas:
µM(f,g, r;x) = max
z∈Nr
D(g(f,x),g(f,z))
Definition 2 (Average Sensitivity).Given a predictor f, an
explanation function g, distance metrics Dand ρ, a radius r,
a distribution Px(·)over the inputs centered at point x, we
define the average sensitivity of gat xas:
µA(f,g, r;x) = Z
z∈Nr
D(g(f,x),g(f,z))Px(z)dz
3.2 Desideratum: High Faithfulness
Faithfulness has been defined in [Yeh et al., 2019]. The fea-
ture importance scores from gshould correspond to the im-
portant features of xfor f; as such, when we set particular
features xsto a baseline value ¯
xs, the change in predictor’s
output should be proportional to the sum of attribution scores
of features in xs. We measure this as the correlation between
the sum of the attributions of xsand the difference in out-
put when setting those features to a reference baseline. For
a subset of indices S⊆ {1,2,...d},xs={xi, i S}
denotes a sub-vector of input features that partitions the in-
put, x=xsxc.x[xs=¯
xs]denotes an input where xs
is set to a reference baseline while xcremains unchanged:
x[xs=¯
xs]=¯
xsxc. When |S|=d,x[xs=¯
xs]=¯
x.
Remark (Reference Baselines).Recent work has discussed
how to pick a proper reference baseline ¯
x.[Sundararajan et
al., 2017]suggests using a baseline where f(¯
x)0, while
others have proposed taking the baseline to be the mean of
the training data. [Chang et al., 2019]notes that the baseline
can be learned using generative modeling.
Definition 3 (Faithfulness).Given a predictor f, an explana-
tion function g, a point x, and a subset size |S|, we define the
faithfulness of gto fat xas:
µF(f,g;x) = corr
S([d]
|S|) X
iS
g(f,x)i,f(x)fx[xs=¯
xs]!
For our experiments, we fix |S|then randomly sample sub-
sets xsof the fixed size from xto estimate correlation. Since
we do not see all [d]
|S|subsets in our calculation of faithful-
ness, we may not get an accurate estimate of the criterion.
Though hard to codify and even harder to aggregate, faithful-
ness is desirable, as it demonstrates that an explanation cap-
tures which features the predictor uses to generate an output
for a given input. Learning global feature importances that
highlight, in expectation, which features a predictor relies on
is a challenging problem left to future work.
3.3 Desideratum: Low Complexity
A complex explanation is one that uses all dfeatures in its ex-
planation of which features of xare important to f. Though
this explanation may be faithful to the model (as defined
above), it may be too difficult for the user to understand (es-
pecially if dis large). We define a fractional contribution dis-
tribution, where |·|denotes absolute value:
Pg(i) = |g(f,x)i|
P
j[d]
|g(f,x)j|;Pg={Pg(1),...,Pg(d)}
Note that Pgis a valid probability distribution. Let Pg(i)
denote the fractional contribution of feature xito the total
magnitude of the attribution. If every feature had equal attri-
bution, the explanation would be complex (even if it is faith-
ful). The simplest explanation would be concentrated on one
feature. We define complexity as the entropy of Pg.
Definition 4 (Complexity).Given a predictor f, explanation
function g, and a point x, the complexity of gat xis:
µC(f,g;x) = Eiln(Pg)=
d
X
i=1
Pg(i) ln(Pg(i))
4 Aggregating Explanations
Given a trained predictor f, a set of explanation functions
Gm={g1,...,gm}, a criterion to optimize µ, and a set of
inputs Dx, we want to find an aggregate explanation function
gagg that satisfies µat least as well as any gi∈ Gm. Let
h(·)represent some function that combines mexplanations
into a consensus gagg =h(Gm). We now explore different
candidates for h(·).
4.1 Convex Combination
Suppose we have two different explanation functions g1and
g2and have chosen a criterion µto evaluate a g. Consider an
aggregate explanation, gagg =h(g1,g2). A potential h(·)is
a convex combination where gagg =h(g1,g2) = wg1+(1
w)g2=w|Gm.
Proposition 1. If Dis the 2distance and µ=µA(average
sensitivity), the following holds:
µA(gagg)A(g1) + (1 w)µA(g2)
Proof. Assuming Px(z)is uniform, we can apply the triangle
inequality and the convexity of Dto arrive at the above.
A convex combination of explanation functions thus yields
an aggregate explanation function that is at most as sensitive
as any of the explanation functions taken alone. In order to
learn wgiven g1and g2, we set up an objective as follows.
w= arg min
w
E
x∼DxµA(gagg (f,x))(1)
Assuming a uniform distribution around all x∈ Dx, we can
rewrite this as:
w= arg min
wZ
x∼DxZ
z∈Nr
D(gagg (x),gagg (z))Px(z)dzdx
By Cauchy-Schwartz, we get the following:
warg min
wZ
x∼DxZ
z∈Nr
D(a, b)dzdx
where a=wg1(f,x) + (1 w)g2(f,x)and b=
wg1(f,z) + (1 w)g2(f,z). This implies that wwill
be minimal when one element of wis 0 and the other is
1. Therefore, a convex combination of two explanation func-
tions, found by solving Equation (1), will be at most as sen-
sitive as the least sensitive explanation function.
4.2 Centroid Aggregation
Another sensible candidate for h(·)to combine mexplana-
tion functions is based on centroids with respect to some dis-
tance function D:G × G 7→ R, so that:
gagg arg min
g∈G
E
gi∈GmD(g,gi)p= arg min
g∈G
m
X
i=1
D(g,gi)p
where pis a positive constant. The simplest examples of dis-
tances are the 2and 1distances with real-valued attributions
where G ⊆ Rd.
Proposition 2. When Dis the 2distance and p= 2, the
aggregate explanation is the feature-wise sample mean.
gagg(f,x) = gavg (f,x) = 1
m
m
X
i=1
gi(f,x)(2)
Proposition 3. When Dis the 1distance and p= 1, the
aggregate explanation is the feature-wise sample median.
gagg(f,x) = med{Gm}
Propositions 2 and 3 follow from standard results in statis-
tics that the mean minimizes the sum of squared differences
and the median minimizes the sum of absolute deviations
[Berger, 2013].
We could obtain rank-valued attributions by taking any
quantitative vector-valued attributions and ranking features
according to their values. If Dis the Kendall-tau distance
with rank-valued attributions where G ⊆ Sd(the set of per-
mutations over dfeatures), then the resulting aggregation
mechanism via computing the centroid is called the Kemeny-
Young rule. For rank-valued attributions, any aggregation
mechanism falls under the rank aggregation problem in so-
cial choice theory for which many practical “voting rules”
exist [Bhatt et al., 2019a].
We analyze the error of a candidate gagg. Suppose the
optimal explanation for xusing fis g(f,x)and suppose
gagg is the mean explanation for xin Equation (2). Let
i,x=||g(f,x)gi(f,x)|| be the error between the opti-
mal explanation and the ith explanation function.
Proposition 4. The error between the aggregate explanation
gagg(f,x)and the optimal explanation g*(f,x)satisfies:
agg Pn
i=1 Pm
j=1 j,xi
mn
Proof. For a fixed x, we have:
agg,x=||g(f,x)gagg(f,x)||
=||mg(f,x)
m1
m
m
X
i=1
gi(f,x)||
1
m
m
X
i=1
||g(f,x)gi(f,x)|| =Pm
i=1 i,x
m
Averaging across Dx, we obtain the result.
Hence, by aggregating, we do better than when using one
explanation function alone. Many gradient-based explanation
functions fit to noise [Hooker et al., 2019]. One way to reduce
noise would be to aggregate by ensembling or averaging. As
proven in Proposition 4, the typical error of the aggregate is
less than the expected error of each function alone.
5 Lowering Complexity Via Aggregation
In this section, we describe iterative algorithms for aggre-
gating explanation functions to obtain gagg(f,x)with lower
complexity whilst combining mcandidate explanation func-
tions Gm={g1,...,gm}. We desire a gagg(f,x)that con-
tains information from all candidate explanations gi(f,x)
yet has entropy less than or equal to that of each explana-
tion gi(f,x). As discussed, a reasonable candidate for an
aggregate explanation function is the sample mean given by
Equation (2). We may want gagg(f,x)to approach the sam-
ple mean, gavg(f,x); however, the sample mean may have
greater complexity than that of each gi(f,x).
For example, let g1(f,x)=[1,0]Tand g2(f,x) =
[0,1]T. The sample mean is gavg (f,x)=[0.5,0.5]T. Both
g1and g2have the minimum possible complexity of 0, while
gavg has the maximum possible complexity, log(2). Our ag-
gregation technique must ensure that gagg(f,x)approaches
gavg(f,x)while guaranteeing gagg (f,x)has complexity less
than or equal to that of each gi(f,x). We now present two
approaches for learning a lower complexity explanation, vi-
sually represented in Figure 1.
5.1 Gradient-Descent Style Method
Our first approach is similar to gradient descent. Starting
from each gi(f,x), we iteratively move towards gavg(f,x)
in each of the ddirections (i.e., changing the kth feature
by a small amount) if the complexity decreases with that
move. We stop moving when the complexity no longer de-
creases or gavg(f,x)is reached. Simultaneously, we start
from gavg(f,x)and iteratively move towards each gi(f,x)
in each of the ddirections if the complexity decreases. We
stop moving when the complexity no longer decreases or any
of the gi(f,x)are reached. The final gagg(f,x)is the loca-
tion that has the smallest complexity from these 2ddifferent
walks. Since we only move if the complexity decreases and
start from each gi(f,x), the entropy of gagg(f,x)is guaran-
teed to be less than or equal to the entropy of all gi(f,x).
5.2 Region Shrinking Method
In our second approach, we consider the closed region, R,
which is the convex hull of all the explanation functions,
gi(f,x). Notice region Rinitially contains gavg . We con-
sider an iterative approach to find the global minimum in the
region R. As before, we consider the convex combination
formed by two explanation functions, giand gj. Using con-
vex optimization, we find the value on the line segment be-
tween giand gjthat has the minimum complexity; essen-
tially, we iteratively shrink the region. For the region shrink-
ing method, the convex combination formed by giand gjis:
w(gi) + (1 w)(gj), w [0,1]
For every pair of functions in Gm, we find the functions that
produces the minimum complexity in the convex combina-
tion of the functions, producing a new set of candidates G0
m.
gagg is the element in set G0
mwith minimal complexity af-
ter Kiterations. In each iteration, a function is chosen if it
has the minimum complexity of all the functions in a convex
combination. Thus, the minimum complexity of the set G0
m
decreases or remains constant with each iteration.
6 Lowering Sensitivity Via Aggregation
To construct an aggregate explanation function gthat mini-
mizes sensitivity, we would need to ensure that a test point’s
explanation is a function of the explanations of its nearest
neighbors under ρ. This is a natural analog for how hu-
mans reason: we use past similar events (training data) and
facts about the present (individual features) to make decisions
[Bhatt et al., 2019b]. We now contribute a new explanation
function gAVA that combines the Shapley value explanations
of a test point’s nearest neighbors to explain the test point.
Figure 1: Visual examples of the two complexity lowering aggrega-
tion algorithms: gradient-descent style (a) and region shrinking (b)
methods using explanation functions g1,g2,g3
6.1 Shapley Value Review
Borrowing from game theory, Shapley values denote the
marginal contributions of a player to the payoff of a coali-
tional game. Let Tbe the number of players and let v:
2TRbe the characteristic function, where v(S)denotes
the worth (contribution) of the players in ST. The Shapley
value of player is contribution (averaging player is marginal
contributions to all possible subsets S) is:
φi(v) = 1
|T|X
ST\{i}T1
S1
(v(S∪ {i})v(S))
Let ΦRTbe a Shapley value contribution vector for all
players in the game, where φi(v)is the ith element of Φ.
6.2 Shapley Values as Explanations
In the feature importance literature, we formulate a similar
problem to where the game’s payoff is the predictor’s output
y=f(x), the players are the dfeatures of x, and the φi
values represent the contribution of xito the game f(x). Let
the characteristic function be the importance score of a subset
of features xs, where EY[·|x]is an expectation over Pf(·|x):
vx(S) = EYlog 1
Pf(Y|xs)
x
This characteristic function denotes the negative of the ex-
pected number of bits required to encode the predictor’s out-
put based on the features in a subset S[Chen et al., 2019].
Shapley value contributions can be approximated via Monte
Carlo sampling [ˇ
Strumbelj and Kononenko, 2014]or via
weighted least squares [Lundberg and Lee, 2017].
6.3 Aggregate Valuation of Antecedents
We now explore how to explain a test point in terms of the
Shapley value explanations of its neighbors. Termed Aggre-
gate Valuation of Antecedents (AVA), we derive an explana-
tion function that explains a data point in terms of the expla-
nations of its neighbors. We do the following: suppose we
want to find an explanation function gAVA(f,xtest)for a point
of interest xtest. First we find the knearest neighbors of xtest
under ρdenoted by Nk(xtest,D).
Nk(xtest,D) = arg min
N ⊂D,|N |=kX
z∈N
ρ(xtest,z)
We define gAVA(f,xtest) = Φxtest as the explanation function
where:
gAVA(f,xtest)i=φi(vAVA) = X
z∈Nk(xtest)
gSHAP(f,z)i
ρ(xtest,z)
=X
z∈Nk(xtest)
φi(vz)
ρ(xtest,z)
In essence, we weight each neighbor’s Shapley value con-
tribution by the inverse distance from the neighbor to the test
point. AVA is closely related to bootstrap aggregation from
classical statistics, as we take an average of model outputs to
improve explanation function stability.
Theorem 5. gAVA(f,xtest )is a Shapley value explanation.
Proof. We want to show that gAVA (f,xtest )=Φxtest is indeed
a vector of Shapley values. Let gSHAP (f,z) = Φzbe the
vector of Shapley value contributions for a point z∈ Nk. By
[Lundberg and Lee, 2017], we know gSHAP(f,z)i=φi(vz)
is a unique Shapley value for the characteristic function vz.
By linearity of Shapley values [Shapley, 1953], we know that:
φi(vz1+vz2) = φi(vz1) + φi(vz2)(3)
This means that the Φz1+ Φz2will yield a unique Shapley
value contribution vector for the characteristic function vz1+
vz2. By linearity (or additivity), we know for any scalar α:
αφi(vz) = φi(αvz)(4)
This means αΦzwill yield a unique Shapley value contribu-
tion vector for the characteristic function αvz. Now define:
Φxtest =X
z∈Nk(xtest)
Φz
ρ(xtest,z)
We can conclude that Φxtest is a vector of Shapley values.
While [Sundararajan et al., 2017]takes a path integral
from a fixed reference baseline ¯
xand [Lundberg and Lee,
2017]only considers attribution along the straight line path
between ¯
xand xtest, AVA takes a weighted average of attri-
butions along paths from training points in Nkto xtest. AVA
can similarly be thought of as a convex combination of ex-
planation functions where the explanation functions are the
explanations of the neighbors of xtest and the weights are
ρ(xtest,z)1. Though the weights are guaranteed to be non-
negative, we normalize the weights to sum to 1and edit the
AVA formulation to be: gAVA (f,xtest ) = ρtotΦxtest where
ρtot =Pz∈Nk(xtest)ρ(xtest ,z)1. Notice this formulation is
a specific convex combination as described before; therefore,
AVA will result in a lower sensitivity than gSHAP(f,x)alone.
6.4 Medical Connection
Similar to how a model uses input features to reach an out-
put, medical professionals learn how to proactively search
for risk predictors in a patient. Medical professionals not
only use patient attributes (e.g., vital signs, personal infor-
mation) to make a diagnosis but also leverage experiences
with past patients; for example, if a doctor treated a rare dis-
ease over a decade ago, then that experience can be crucial
when attributes alone are uninformative about how to diag-
nose [Goold and Lipkin Jr, 1999]. This is the analogous to
“close” training points affecting a predictor’s output. AVA
combines the attributions of past training points (past pa-
tients) to explain an unseen test point (current patient). When
using the MIMIC dataset [Johnson et al., 2016], AVA models
the aforementioned intuition.
7 Experiments
We now report some empirical results. We evaluate mod-
els trained on the following datasets: Adult, Iris [Dua and
Graff, 2017], MIMIC [Johnson et al., 2016], and MNIST [Le-
Cun et al., 1998]. We use the following explanation func-
tions: SHAP [Lundberg and Lee, 2017], Shapley Sampling
(SS) [ˇ
Strumbelj and Kononenko, 2014], Gradient Saliency
(Grad) [Baehrens et al., 2010], Grad*Input (G*I) [Shrikumar
et al., 2017], Integrated Gradients (IG) [Sundararajan et al.,
2017], and DeepLift (DL) [Shrikumar et al., 2017].
For all tabular datasets, we train a multilayer perceptron
(MLP) with leaky-ReLU activation using the ADAM opti-
mizer. For Iris [Dua and Graff, 2017], we train our model
to 96% test accuracy. For Adult [Dua and Graff, 2017], our
model has 82% test accuracy. As motivated in Section 6.4,
we use MIMIC (Medical Information Mart for Intensive Care
III) [Johnson et al., 2016]. We extract seventeen real-valued
features deemed critical, per [Purushotham et al., 2018], for
sepsis prediction. Our model gets 91% test accuracy on the
task. For MNIST [LeCun et al., 1998], our model is a convo-
lutional neural network and has 90% test accuracy.
For experiments with a baseline ¯
x, zero baseline implies
that we set features to 0and average baseline uses the average
feature value in D. Before doing aggregation, we unit norm
all explanations. For the complexity criterion, we take the
positive 1norm. We set D=2and ρ=.
7.1 Faithfulness µF
In Table 2, we report results for faithfulness for various ex-
planation functions. When evaluating, we take the average
of multiple runs where, in each run, we see at least 50 dat-
apoints; for each datapoint, we randomly select |S|features
and replace them with baseline values. We then calculate the
Pearson’s correlation coefficient between the predicted logits
of each modified test point and the average explanation at-
tribution for only the subset of features. We notice that, as
subset size increases, faithfulness increases until the subset is
large enough to contain all informative features. We find that
Shapley values, approximated with weighted least squares,
are the most faithful explanation function for smaller datasets.
7.2 Max and Avg Sensitivity µMand µA
In Table 3, we report the max and average sensitivities for
various explanation functions. To evaluate the sensitivity cri-
terion, we sample a set of test points from Dand an additional
larger set of training points. We then find the training points
that fall within a radius rneighborhood of each test point and
find the distance between each nearby training point explana-
tion and the test point explanation to get a mean and max. We
average over ten random runs of this procedure. Sensitivity is
INP UT BE ST ( DE EP LIFT ) CONVEX GRADIENT-DE SC ENT REGION-SHRINKING
µC= 3.688 µC= 3.685 µC= 3.575 µC= 3.208
Table 1: Qualitative example of aggregation to lower complexity (µC): We show that it is possible to lower complexity slightly with both of
our approaches; note that achieving lowest complexity on an image would imply that all attribution is placed on a single pixel.
METHOD ADU LT IRIS MIMIC MIMIC
SUBSET 2 2 10 20
SHAP (62,60) (6 7,6 8) (31, 36) (37, 47)
SS (46, 27) (32, 36) (59, 58) (38, 45)
GRA D (30, 53) (14, 16) (37, 41) (28, 63)
G*I (38, 39) (27, 30) (54, 48) (59, 43)
IG (47, 33) (60, 57) (66, 51) (68, 5 1)
DL (58, 43) (46, 48) (84, 54) (43, 45)
Table 2: Faithfulness µFaveraged over a test set: (Zero Baseline,
Training Average Baseline). Exact quantities can be obtained by
dividing table entries by 102
METHOD ADU LT IRIS MIMIC
RADIUS 2 0.2 4
SHAP (60, 54) (310, 287) (6,5)
SS (191, 168) (477 , 345) (83, 81)
GRA D (60, 50) (68,66 ) (28, 28)
G*I (86, 71) (298, 279) (77, 50)
IG (19,17 ) (495, 462) (19, 15)
DL (74, 74) (850, 820) (135, 111)
Table 3: Sensitivity: (Max µM, Avg µA). Exact quantities can be
obtained by dividing table entries by 103
highly dependent on the dimensionality dand on the radius r.
We find that as sensitivity decreases as rincreases. Empiri-
cally, for MIMIC, Shapley values approximated by weighted
least squares (SHAP) are the least sensitive.
7.3 MNIST Complexity µC
In Table 1, we provide a qualitative example for the gradi-
ent descent-style and region-shrinking methods for lowering
complexity of explanations from a model trained on MNIST.
We show an example with images since it illustrates the no-
tion of lower complexity well; however, other data types (tab-
ular) might be better suited for complexity optimization.
7.4 AVA
Our empirical findings support use of an AVA explanation if
low sensitivity is desired. [Ghorbani et al., 2019]note that
perturbation-based explanations (like gSHAP) are less sensi-
tive than their gradient-based counterparts. In Table 4, we
show that AVA explanations not only have lower sensitivities
in all experiments but also have less complex explanations
(depending on the radius rand number of features d). After
METHOD ADU LT IRIS MIMIC
µA(f,gSHAP)0.16 ±0.11 0.22 ±0.25 0.47 ±0.12
µA(f,gAVA )0.07 ±0.07 0.13 ±0.18 0.31 ±0.13
µM(f,gSHAP)0.68 ±0.13 1.20 ±0.36 0.83 ±0.17
µM(f,gAVA )0.52 ±0.11 1.18 ±0.28 0.72 ±0.22
µC(f,gSHAP)1.94 ±0.26 1.36 ±0.36 2.33 ±0.23
µC(f,gAVA )1.93 ±0.24 1.24 ±0.32 2.61 ±0.29
Table 4: AVA lowers the sensitivity of Shapley value explanations
across all datasets. When dis small (fewer features), AVA explana-
tions are slightly less complex.
finding the average distance between pairs of points, we use
r= 1 for Adult, r= 0.3for Iris, and r= 10 for MIMIC.
8 Conclusion
Borrowing from earlier work in social science and the philos-
ophy of science, we codify low sensitivity, high faithfulness,
and low complexity as three desirable properties of explana-
tion functions. We define these three properties for feature-
based explanation functions, develop an aggregation scheme
for learning combinations of various explanation functions,
and devise schemes to learn explanations with lower com-
plexity (iterative approaches) and lower sensitivity (AVA).
We hope that this work will provide practitioners with a prin-
cipled way to evaluate feature-based explanations and to learn
an explanation which aggregates and optimizes for criteria
desired by end users. Though we consider one criterion at
a time, future work could further axiomatize our criteria, ex-
plore the interaction between different evaluation criteria, and
devise a multi-objective optimization approach to finding a
desirable explanation; for example, can we develop a proce-
dure for learning a less sensitive and less complex explanation
function simultaneously?
Acknowledgements
We thank reviewers for their feedback. We thank Pradeep
Ravikumar, John Shi, Brian Davis, Kathleen Ruan, Javier An-
toran, James Allingham, and Adithya Raghuraman for their
comments and help. UB acknowledges support from Deep-
Mind and the Leverhulme Trust via the Leverhulme Cen-
ter for the Future of Intelligence (CFI) and from the Part-
nership on AI. AW acknowledges support from the David
MacKay Newton Research Fellowship at Darwin College,
The Alan Turing Institute under EPSRC grant EP/N510129/1
& TU/B/000074, and the Leverhulme Trust via the CFI.
References
[Adebayo et al., 2018]Julius Adebayo, Justin Gilmer,
Michael Muelly, Ian Goodfellow, Moritz Hardt, and
Been Kim. Sanity checks for saliency maps. In Ad-
vances in Neural Information Processing Systems, pages
9505–9515, 2018.
[Ancona et al., 2018]Marco Ancona, Enea Ceolini, Cengiz
Oztireli, and Markus Gross. Towards better understand-
ing of gradient-based attribution methods for deep neural
networks. In 6th International Conference on Learning
Representations (ICLR 2018), 2018.
[Baehrens et al., 2010]David Baehrens, Timon Schroeter,
Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and
Klaus-Robert Muller. How to explain individual classifi-
cation decisions. JMLR, 11(Jun):1803–1831, 2010.
[Batterman and Rice, 2014]Robert W Batterman and
Collin C Rice. Minimal model explanations. Philosophy
of Science, 81(3):349–376, 2014.
[Berger, 2013]James O Berger. Statistical decision theory
and Bayesian analysis. Springer Science & Business Me-
dia, 2013.
[Bhatt et al., 2019a]Umang Bhatt, Pradeep Ravikumar,
et al. Building human-machine trust via interpretability.
In Proceedings of the AAAI Conference on Artificial Intel-
ligence, volume 33, pages 9919–9920, 2019.
[Bhatt et al., 2019b]Umang Bhatt, Pradeep Ravikumar, and
Jos´
e M. F. Moura. Towards aggregating weighted feature
attributions. arXiv:1901.10040, 2019.
[Bhatt et al., 2020]Umang Bhatt, Alice Xiang, Shubham
Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep
Ghosh, Ruchir Puri, Jos´
e M. F. Moura, and Peter Ecker-
sley. Explainable machine learning in deployment. ACM
Conference on Fairness, Accountability, and Transparency
(FAT*), 2020.
[Bylinskii et al., 2018]Zoya Bylinskii, Tilke Judd, Aude
Oliva, Antonio Torralba, and Fr´
edo Durand. What do dif-
ferent evaluation metrics tell us about saliency models?
IEEE transactions on pattern analysis and machine intel-
ligence, 41(3):740–757, 2018.
[Camburu et al., 2019]Oana-Maria Camburu, Eleonora
Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, and
Phil Blunsom. Can I trust the explainer? Verifying
post-hoc explanatory methods. arXiv:1910.02065, 2019.
[Carter et al., 2019]Brandon Carter, Jonas Mueller, Sid-
dhartha Jain, and David Gifford. What made you do this?
understanding black-box decisions with sufficient input
subsets. In The 22nd International Conference on Arti-
ficial Intelligence and Statistics, pages 567–576, 2019.
[Chang et al., 2019]Chun-Hao Chang, Elliot Creager, Anna
Goldenberg, and David Duvenaud. Explaining image clas-
sifiers by counterfactual generation. In International Con-
ference on Learning Representations, 2019.
[Chen et al., 2018]Jianbo Chen, Le Song, Martin Wain-
wright, and Michael Jordan. Learning to explain: An
information-theoretic perspective on model interpretation.
In Jennifer Dy and Andreas Krause, editors, Proceedings
of the 35th International Conference on Machine Learn-
ing, volume 80 of Proceedings of Machine Learning Re-
search, pages 883–892, Stockholmsm¨
assan, Stockholm
Sweden, 10–15 Jul 2018. PMLR.
[Chen et al., 2019]Jianbo Chen, Le Song, Martin J Wain-
wright, and Michael I Jordan. L-shapley and C-shapley:
Efficient model interpretation for structured data. Interna-
tional Conference on Learning Representations, 2019.
[Davis et al., 2020]B. Davis, U. Bhatt, K. Bhardwaj,
R. Marculescu, and J. M. F. Moura. On network sci-
ence and mutual information for explaining deep neural
networks. In ICASSP 2020 - 2020 IEEE International
Conference on Acoustics, Speech and Signal Processing
(ICASSP), pages 8399–8403, 2020.
[Dua and Graff, 2017]Dheeru Dua and Casey Graff. UCI
machine learning repository, 2017.
[Ghorbani et al., 2019]Amirata Ghorbani, Abubakar Abid,
and James Zou. Interpretation of neural networks is frag-
ile. In Proceedings of the AAAI Conference on Artificial
Intelligence, volume 33, pages 3681–3688, 2019.
[Gilpin et al., 2018]Leilani H Gilpin, David Bau, Ben Z
Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal.
Explaining explanations: An overview of interpretability
of machine learning. In 2018 IEEE 5th International Con-
ference on data science and advanced analytics (DSAA),
pages 80–89. IEEE, 2018.
[Goold and Lipkin Jr, 1999]Susan Dorr Goold and Mack
Lipkin Jr. The doctor–patient relationship: challenges,
opportunities, and strategies. Journal of general internal
medicine, 14(Suppl 1):S26, 1999.
[Grabska-Barwi´
nska, 2020]Agnieszka Grabska-Barwi´
nska.
Measuring and improving the quality of visual explana-
tions. arXiv preprint arXiv:2003.08774, 2020.
[Hazard et al., 2019]Christopher J Hazard, Christopher
Fusting, Michael Resnick, Michael Auerbach, Michael
Meehan, and Valeri Korobov. Natively interpretable
machine learning and artificial intelligence: Prelim-
inary results and future directions. arXiv preprint
arXiv:1901.00246, 2019.
[Hind et al., 2019]Michael Hind, Dennis Wei, Murray
Campbell, Noel CF Codella, Amit Dhurandhar, Aleksan-
dra Mojsilovi´
c, Karthikeyan Natesan Ramamurthy, and
Kush R Varshney. Ted: Teaching ai to explain its deci-
sions. In Proceedings of the 2019 AAAI/ACM Conference
on AI, Ethics, and Society, pages 123–129, 2019.
[Honegger, 2018]Milo Honegger. Shedding Light on Black
Box Algorithms. Master’s thesis, Karlsruhe Institute of
Technology, Germany, 2018.
[Hooker et al., 2019]Sara Hooker, Dumitru Erhan, Pieter-
Jan Kindermans, and Been Kim. A benchmark for inter-
pretability methods in deep neural networks. In Advances
in Neural Information Processing Systems, pages 9734–
9745, 2019.
[Johnson et al., 2016]Alistair EW Johnson, Tom J Pollard,
Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad
Ghassemi, Benjamin Moody, Peter Szolovits, Leo An-
thony Celi, and Roger G Mark. Mimic-iii, a freely ac-
cessible critical care database. Scientific Data, 2016.
[Kindermans et al., 2019]Pieter-Jan Kindermans, Sara
Hooker, Julius Adebayo, Maximilian Alber, Kristof T
Sch¨
utt, Sven D¨
ahne, Dumitru Erhan, and Been Kim. The
(un) reliability of saliency methods. In Explainable AI:
Interpreting, Explaining and Visualizing Deep Learning,
pages 267–280. Springer, 2019.
[Lage et al., 2019]Isaac Lage, Emily Chen, Jeffrey He,
Menaka Narayanan, Been Kim, Samuel J Gershman, and
Finale Doshi-Velez. Human evaluation of models built for
interpretability. In Proceedings of the AAAI Conference
on Human Computation and Crowdsourcing, volume 7,
pages 59–67, 2019.
[LeCun et al., 1998]Yann LeCun, L ´
eon Bottou, Yoshua
Bengio, and Patrick Haffner. Gradient-based learning ap-
plied to document recognition. Proceedings of the IEEE,
86(11):2278–2324, 1998.
[Lipton, 2003]Peter Lipton. Inference to the best explana-
tion. Routledge, 2003.
[Lundberg and Lee, 2017]Scott M Lundberg and Su-In Lee.
A unified approach to interpreting model predictions. In
Advances in Neural Information Processing Systems 30
(NeurIPS 2017), pages 4765–4774, 2017.
[Melis and Jaakkola, 2018]David Alvarez Melis and Tommi
Jaakkola. Towards robust interpretability with self-
explaining neural networks. In Advances in Neural Infor-
mation Processing Systems (NeurIPS 2018), 2018.
[Montavon et al., 2018]Gr´
egoire Montavon, Wojciech
Samek, and Klaus-Robert M ¨
uller. Methods for inter-
preting and understanding deep neural networks. Digital
Signal Processing, 73:1–15, 2018.
[Osman et al., 2020]Ahmed Osman, Leila Arras, and Woj-
ciech Samek. Towards ground truth evaluation of visual
explanations. arXiv preprint arXiv:2003.07258, 2020.
[Plumb et al., 2018]Gregory Plumb, Denali Molitor, and
Ameet S Talwalkar. Model agnostic supervised local ex-
planations. In Advances in Neural Information Processing
Systems, pages 2515–2524, 2018.
[Poursabzi-Sangdeh et al., 2018]Forough Poursabzi-
Sangdeh, Daniel G Goldstein, Jake M Hofman, Jen-
nifer Wortman Vaughan, and Hanna Wallach. Manipulat-
ing and measuring model interpretability. arXiv preprint
arXiv:1802.07810, 2018.
[Purushotham et al., 2018]Sanjay Purushotham, Chuizheng
Meng, Zhengping Che, and Yan Liu. Benchmarking deep
learning models on large healthcare datasets. Journal of
Biomedical Informatics, 83:112–134, 2018.
[Ribeiro et al., 2016]Marco Tulio Ribeiro, Sameer Singh,
and Carlos Guestrin. Why should i trust you?: Explain-
ing the predictions of any classifier. In Proceedings of the
22nd ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining. ACM, 2016.
[Rieger and Hansen, 2020]Laura Rieger and Lars Kai
Hansen. Irof: a low resource evaluation metric for expla-
nation methods. arXiv preprint arXiv:2003.08747, 2020.
[Ruben, 2015]David-Hillel Ruben. Explaining explanation.
Routledge, 2015.
[Samek et al., 2016]Wojciech Samek, Alexander Binder,
Gr´
egoire Montavon, Sebastian Lapuschkin, and Klaus-
Robert M¨
uller. Evaluating the visualization of what a deep
neural network has learned. IEEE transactions on neural
networks and learning systems, 28(11):2660–2673, 2016.
[Shapley, 1953]Lloyd S Shapley. A value for n-person
games. In Contributions to the Theory of Games II, pages
307–317, 1953.
[Shrikumar et al., 2017]Avanti Shrikumar, Peyton Green-
side, and Anshul Kundaje. Learning important features
through propagating activation differences. In Proceed-
ings of the 34th International Conference on Machine
Learning-Volume 70 (ICML 2017), pages 3145–3153.
Journal of Machine Learning Research, 2017.
[ˇ
Strumbelj and Kononenko, 2014]Erik ˇ
Strumbelj and Igor
Kononenko. Explaining prediction models and individ-
ual predictions with feature contributions. Knowledge and
Information Systems, 41(3):647–665, 2014.
[Sundararajan et al., 2017]Mukund Sundararajan, Ankur
Taly, and Qiqi Yan. Axiomatic attribution for deep net-
works. In Proceedings of the 34th International Confer-
ence on Machine Learning-Volume 70 (ICML 2017), pages
3319–3328. Journal of Machine Learning Research, 2017.
[Wang et al., 2020]Zifan Wang, Piotr Mardziel, Anupam
Datta, and Matt Fredrikson. Interpreting interpretations:
Organizing attribution methods by criteria. arXiv preprint
arXiv:2002.07985, 2020.
[Warnecke et al., 2019]Alexander Warnecke, Daniel Arp,
Christian Wressnegger, and Konrad Rieck. Evaluating ex-
planation methods for deep learning in security. arXiv
preprint arXiv:1906.02108, 2019.
[Yang and Kim, 2019]Mengjiao Yang and Been Kim. BIM:
Towards quantitative evaluation of interpretability meth-
ods with ground truth. arXiv:1907.09701, 2019.
[Yang et al., 2019]Fan Yang, Mengnan Du, and Xia
Hu. Evaluating explanation without ground truth
in interpretable machine learning. arXiv preprint
arXiv:1907.06831, 2019.
[Yeh et al., 2019]Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun
Suggala, David I Inouye, and Pradeep K Ravikumar. On
the (in) fidelity and sensitivity of explanations. In Ad-
vances in Neural Information Processing Systems, pages
10965–10976, 2019.
[Zhang et al., 2019]Hao Zhang, Jiayi Chen, Haotian Xue,
and Quanshi Zhang. Towards a unified evaluation of ex-
planation methods without ground truth. arXiv preprint
arXiv:1911.09017, 2019.
A Additional Evaluation Criteria
In addition to the aforementioned three criteria, there are
many other desirable criteria for a g. To assist practitioners,
we now collect and list these additional quantitative evalua-
tion criteria for feature-level explanations. It is possible to
evaluate all criteria for both perturbation-based explanations
[ˇ
Strumbelj and Kononenko, 2014; Lundberg and Lee, 2017]
and gradient-based explanations [Sundararajan et al., 2017;
Shrikumar et al., 2017]. Note we omit evaluation criteria
that assume access to ground-truth explanations for train-
ing points; for a thorough treatment on this topic, see [Hind
et al., 2019; Osman et al., 2020]. We do not delve into
human-centered evaluation of explanation functions either;
see [Gilpin et al., 2018; Poursabzi-Sangdeh et al., 2018;
Yang et al., 2019]for detailed discussions.
Predictability of Explanations
We would want to ensure that explanations from gare pre-
dictable. As such, g(f,x)ought not vary over function calls.
[Honegger, 2018]notes that identical inputs should give the
identical explanations.
Definition 5 (Identity).Given a predictor f, an explanation
function g, and distance metrics Dand ρ, we define the iden-
tity criterion for gon Das:
µIDENTITY(f,g) = Ex∈DxD(g(f,x),g(f,x))
=Ex∼Dx||g(f,x)g(f,x)||0
Note the above two are equivalent and we take the 0norm
of the difference between two separate calls to gwith the
same input x. The identity criterion favors non-stochastic ex-
planation functions. We would want to ensure that any non-
identical inputs should have non-identical explanations.
Definition 6 (Separability).Given a predictor f, an explana-
tion function g, and distance metrics Dand ρ, we define the
separability of gon Das:
µSEP(f,g) = Ex,z∈Dx,x6=zD(g(f,x),g(f,z))
=Ex,z∈Dx,x6=z||g(f,x)g(f,z)||0
We would also want to know how surprising an explana-
tion g(f,x)is compared to explanations for training data.
[Hazard et al., 2019]defines conviction of an input xwith re-
spect to Dxfor k-Nearest Neighbor algorithms; similarly, we
define the conviction of g(f,x)to explanations of training
points, Dx, using g.
Definition 7 (Conviction).Given a predictor f, an explana-
tion function g, a probability distribution over explanations
Pφ(·), and a data point x, we define the conviction of gat x
for Das:
µCON(f,g,Pφ;x) = z∼Dx[I(g(f,z))]
I(g(f,x))
where I(g(f,x)) = ln(Pφ(g(f,x)))
µCON = 0 means that g(f,x)is surprising. As µCON
,g(x)contains an expected amount of surprisal and
can reasonably occur. We desire a higher µCON, implying
that gbehaves predictably. By changing the distribution
to Pφ(·|y=f(x)), the numerator to conditional entropy
where f(z) = f(x), and self-information to I(g(f,x)) =
ln(Pφ(g(f,x)|y=f(x))), we define the conditional con-
viction of g(f,x)to explanations of the same predicted class.
Other techniques have also argued that g(f,x)should re-
cover the output of the original predictor, f(x). Deemed
compatibility, this criterion attempts to use gas a simple
proxy for reproducing the outputs of the complex f.
Definition 8 (Compatibility).Given a predictor fand an ex-
planation function g, we define the completeness of gfor a
dataset Das:
µCOM(f,g) = 1
NX
x∈Dx d
X
i=1
g(f,x)i!f(x)
The closer µCOM is to 0, the more compatible the explana-
tion function is; that is, the explanation function recovers the
complex model’s outputs well. This criterion is related to the
the completeness axiom of some explanation functions [Sun-
dararajan et al., 2017]. An explanation functions can be built
to be compatible with the original model (or complete with
respect to f). This is also related to the notion of post-hoc
accuracy discussed in [Chen et al., 2018].
Importance of Explanations
Not only do we want to ensure that the gfaithfully identifies
the most important features, but we also want to understand
how well fperforms when xsis unobserved (or set to a base-
line xs=¯
xs). In particular, we craft Sto contain the indices
for the |S|features with the highest abs(g(f,x)i).
S= arg max
S[d],|S|=kX
iS
abs(g(f,x)i)
Therefore, xsis now a sub-vector of the most important fea-
tures according to a specific g. As done in [Chang et al.,
2019], we define a score sffor how confidently fpredicts an
output yin terms of log-odds.
sf(y|x) = log(ˆ
Pf(y|x)) log(1 ˆ
Pf(y|x))
Definition 9 (Deletion).Given a predictor f, an explanation
function g, a point of interest x, a predicted output y, and a
subset of important features S, we define the deletion score
for fat xas:
µDEL(f,g;x, y) = sf(y|x)sf(y|x[xs=¯
xs])
Definition 10 (Addition).Given a predictor f, an explana-
tion function g, a point of interest x, a predicted output y,
and a subset of important features S, we define the addition
score for fat xas:
µADD(f,g;x, y) = sf(y|x[xs=¯
xs])sf(y|¯
x)
While the deletion score conveys how the log-odds change
when we delete the subset of important features from x, the
addition score tells us how much the log-odds change when
we add the subset to the baseline. Instead of re-scoring (via
change in log-odds) a modified input like x[xs=¯
xs], we can
retrain the predictor fbased on a dataset of modified inputs
Dx[xs=¯
xs]. Addition and Deletion are closely related to ex-
planation selectivity, described in [Montavon et al., 2018].
Let f¯
xsdenote the predictor trained on the modified in-
puts with the most important pixels removed. As in [Hooker
et al., 2019], we define the ROAR score as the difference in
accuracy between the original predictor and the modified pre-
dictor. We can also train a predictor where the least important
features (those in xc) are removed. We denote that predictor
to be f¯
xcand define a KAR score, as proposed in [Hooker et
al., 2019].
Definition 11 (ROAR).Given a predictor f, an explanation
function g, a modified predictor f¯
xs, and a subset of impor-
tant features S, we define the ROAR score for gon a dataset
Das:
µROAR(f,g,f¯
xs) = 1
NX
x∈Dx
[f(x) = y][f¯
xs(x) = y]
Definition 12 (KAR).Given a predictor f, an explanation
function g, a modified predictor f¯
xc, and a subset of impor-
tant features S, we define the KAR score for gon a dataset D
as:
µKAR(f,g,f¯
xc) = 1
NX
x∈Dx
[f(x) = y][f¯
xc(x) = y]
Other Connections
We can also draw parallels between the three criteria pro-
posed in the main paper and existing criteria in the literature.
Low sensitivity is discussed as stability in [Melis and
Jaakkola, 2018], as explanation continuity in [Montavon et
al., 2018], as sensitivity in [Yeh et al., 2019], and as reliabil-
ity in [Kindermans et al., 2019].
High faithfulness appears as relevance in [Samek et al.,
2016], as gold set in [Ribeiro et al., 2016], as faithfulness in
[Plumb et al., 2018], as sensitivity-n in [Ancona et al., 2018],
and as infidelity in [Yeh et al., 2019].
Low complexity is loosely related to information gain from
[Bylinskii et al., 2018]and to descriptive sparsity from [War-
necke et al., 2019].
Moreover, very recent literature has also tried to develop
various other explanation function criteria: parameter ran-
domization [Adebayo et al., 2018], clustering-based interpre-
tations [Carter et al., 2019], existence of “unexplainable com-
ponents” [Zhang et al., 2019], variants of perturbation tech-
niques [Grabska-Barwi´
nska, 2020], variants of mutual infor-
mation measures [Davis et al., 2020], impact of iterative fea-
ture removal [Rieger and Hansen, 2020], and necessity and
sufficiency of attributions [Wang et al., 2020].
B Proofs
For thoroughness, we elaborate on the proofs from the main
paper here.
B.1 Proof of Proposition 1
Proof. Assuming we fix the predictor f, let g(x) = g(f,x)
and let Rrepresent R
ρ(x,z)R
for the rest of this proof.
µA(gagg ) = ZD(gagg (x),gagg (z))Px(z)dz
=Z||gagg (x)gagg (z)||2dz
=Z||wg1(x) + (1 w)g2(x)wg1(z)(1 w)g2(z)||2dz
=Z||wg1(x)wg1(z) + (1 w)g2(x)(1 w)g2(z)||2dz
=Z||w(g1(x)g1(z)) + (1 w)(g2(x)g2(z))||2dz
Z||w(g1(x)g1(z))||2+||(1 w)(g2(x)g2(z))||2dz
Zw||g1(x)g1(z)||2+ (1 w)||g2(x)g2(z)||2dz
ZwD(g1(x),g1(z)) + (1 w)D(g2(x),g2(z))dz
wZD(g1(x),g1(z))dz+ (1 w)ZD(g2(x),g2(z))dz
A(g1) + (1 w)µA(g2)
B.2 Proof of Proposition 2
Proof. To prove this, we just need to show that the sum of
the squared distances is minimized by the mean of a set of
explanation vectors:
gagg (f,x) = 1
m
m
X
i=1
gi(f,x)
Recall we have a set of candidate explanation functions Gm=
{g1,...,gm}. Fix a point of interest x. Since dis the 2
distance and p= 2, we define a loss function as follows:
L(gagg(f,x)) =
m
X
i=1
||gagg(f,x)gi(f,x)||2
2
We then compute the partial derivatives with respect to each
feature of our aggregate explanation gagg(f,x)j.
∂L
gagg(f,x)j
= 2mgagg(f,x)j2
m
X
i=1
gi(f,x)j
gagg(f,x)j=Pm
i=1 gi(f,x)j
m
gagg(f,x) =
Pm
i=1 gi(f,x)1
m
.
.
.
Pm
i=1 gi(f,x)d
m
=1
m
m
X
i=1
gi(f,x)
B.3 Proof of Proposition 3
Proof. To prove this, we just need to show that the sum of
the absolute distances is minimized by the median of a set of
explanation vectors:
gagg (f,x) = med{gi(f,x)}
Recall we have a set of candidate explanation functions Gm=
{g1,...,gm}. Fix a point of interest x. Since dis the 1
distance and p= 1, we define a loss function as follows:
L(gagg(f,x)) =
m
X
i=1
|gagg(f,x)gi(f,x)|
Taking the partial derivative of the above with respect to each
feature of our aggregate explanation gagg(x)jyields.
∂L
gagg(f,x)j
=
m
X
i=1
sign(gagg(f,x)jgi(f,x)j)
Now the above partial derivative only equals zero when the
number of positive and negative items are the same. The me-
dian is the only value where the number of positive items
(those greater than the median) and the number of negative
items (those less than the median) are equal. Thus, the me-
dian value for each feature jwould minimize the sum of ab-
solute deviations loss we crafted above (i.e. gagg (f,x)j=
med{g1(f,x)j,g2(f,x)j,...,gm(f,x)j}.
B.4 Alternative Proof of Theorem 5
Proof. We want to show that gAVA (f,xtest )=Φxtest is indeed
a vector of Shapley values. Let gSHAP (f,z) = Φzbe the
vector of Shapley value contributions for a point z∈ Nk.
By [Lundberg and Lee, 2017], we know that gSHAP(f,z)i=
φi(vz)is a unique Shapley value for the characteristic func-
tion vz. By linearity of Shapley values [Shapley, 1953], we
know that:
φi(vz1+vz2) = φi(vz1) + φi(vz2)(5)
This means that the Φz1+ Φz2will yield a unique Shapley
value contribution vector for the characteristic function vz1+
vz2. By linearity (also called additivity), we also know that,
for any scalar α:
αφi(vz) = φi(αvz)(6)
This means that the αΦzwill yield a unique Shapley value
contribution vector for the characteristic function αvz. Now,
to show Φxtest is a vector of Shapley values, it suffices to show
that any φi(vAVA)Φxtest is a Shapley value. As such, we
define vAVA to be the characteristic function of gAVA(f,x),
where we find the average weighted importance score of the
neighbors of xtest.
vAVA(S) = X
z∈Nk(xtest)
vz(S)
ρ(xtest,z)(7)
=X
z∈Nk(xtest)
1
ρ(xtest,z)EYlog 1
Pf(Y|zs)
z
By Equations 5, 6, and 7, we can see that φi(vAVA )is a Shap-
ley value.
gAVA(f,xtest)i=φi(vAVA)(8)
=X
z∈Nk(xtest)
gSHAP(f,z)i
ρ(xtest,z)
=X
z∈Nk(xtest)
φi(vz)
ρ(xtest,z)
C Details on Lowering Complexity
Given a fixed input xand an explanation function gi, the
complexity can be rewritten as:
µC(f,gi;x) =
d
X
k=1
|gi(f,x)k|
P
j[d]
|gi(f,x)j|ln
|gi(f,x)k|
P
j[d]
|gi(f,x)j|
This will help us determine how a small perturbation of the
kth component of gi(f,x)will affect the complexity of gi,
which, in turn, will help find a lower complexity explanation.
Note gi(f,x)kis the kth component of gi(f,x). The partial
derivative of µC(f,gi;x)with respect to the kth component
of gi(f,x)is:
∂µC(f,gi;x)
gi(f,x)k
=(1 + ln (a)) Pd
l=1
l6=k|gi(f,x)l|
(P
j[d]
|gi(f,x)j|)2
+
d
X
l=1
l6=k
(1 + ln (b)) |gi(f,x)l|
(P
j[d]
|gi(f,x)j|)2
where a=|gi(f,x)k|
P
j[d]
|gi(f,x)j|and b=|gi(f,x)l|
P
j[d]
|gi(f,x)j|.
We now provide an additional discussion and comparison
of the two algorithms for lowering complexity.
We presented two algorithms for finding a gagg with lower
complexity: a gradient descent approach (Algorithm 1) and a
region shrinking approach (Algorithm 2). Algorithm 1 relies
on a greedy choice of selecting one of the jdirections to move
in. This algorithm works best for regions that are smooth and
with decreasing complexity around giand gavg. Since Algo-
rithm 1 does not backtrack and moves component-wise, it can
avoid areas of higher complexity, but can take a sub-optimal
step. For example, consider when d= 2. During a walk,
Algorithm 1 may start at gi, move in the xdirection, but then
get stuck as complexity in ydirection increases. However,
had we moved in the ydirection first and then in the xdi-
rection, then we may have found a minimum. The choice of
component plagues this approach. On the other hand, Algo-
rithm 2 solves the issue of getting stuck because of regions
of high complexity present in Algorithm 1. Since Algorithm
2 shrinks the region by choosing points in the convex com-
bination, it a can avoid the areas of high complexity. Since
Algorithm 2 uses the line segments between the points cho-
sen, it may be difficult to obtain the global minima, which
Algorithm 1 Gradient-Descent Style Approach to finding
gagg(f,x)with lower complexity
Require: α,gi(f,x), i = 1, . . . , m, fixed x
Calculate the complexity of each gi(f,x)
for i= 1, . . . , m do
Egi(x)µC(f,gi;x)
Pd
k=1
|gi(f,x)k|
P
j[d]
|gi(f,x)j|ln |gi(f,x)k|
P
j[d]
|gi(f,x)j|!
end for
gavg(f,x)1
mPm
i=1 gi(f,x)
for i= 1, . . . , m do
Move in the direction of gavg(f,x)from gi(f,x)as
long as the complexity decreases
tigi(f,x)
while Complexity of tiis decreasing and ti6=
gavg(f,x)do
for j= 1, . . . , d do
Calculate ∂Eti
tij
if Complexity decreases by moving in the jdirec-
tion towards gavg(f,x)then
Update tij
tij tij +α∂Eti
tij
end if
end for
end while
Move in the direction of gi(f,x)from gavg(f,x)as
long as the complexity decreases
qigavg(f,x)
while Complexity of qiis decreasing and qi6=gi(x)
do
for j= 1, . . . , d do
Calculate ∂Eqi
qij
if Complexity decreases by moving in the jdirec-
tion towards gi(x)then
Update qij
qij qij +α∂Eqi
qij
end if
end for
end while
Take the ti,qithat minimizes the complexity
bi=min
x={qi,ti}Ex
end for
Take the bithat minimizes the complexity
gagg(f,x) = min
bi
Ebi
Algorithm 2 Region Shrinking Approach to finding
gagg(f,x)with lower complexity
Require: gi(f,x), i = 1, . . . , m, fixed x
t0
Add all the giinto set S
S← {gi(f,x), i = 1, . . . , m}
repeat
Repeat K times
Initialize S0
S0← ∅
for every 2 points in S: P1, P2do
Find point Pwith the minimum entropy in the convex
combination of P1, P2
Add point Pto S0
end for
Update values
Choose the Nminimum entropy points in S0to form S
tt+ 1
until t=K
Take the element in set Sthat minimizes the entropy
gagg(f,x) = min
kSEk
may not occur on the line segment. A combination of the
two approaches can be used. First, Algorithm 2 can be used
to shrink the region being considered into a set, S, of points
with low complexity. This can avoid getting stuck in areas of
high complexity, like in Algorithm 1. Then, Algorithm 1 can
be used to move around the points in set Sin order to find
the global minima that may not occur on the line segments
considered in Algorithm 2. It can refine the points in set S
to obtain a lower complexity. In sum, we can shrink the re-
gion considered into several candidate sets and then refine the
points in each set by perturbing and performing greedy walks
around them to find gagg with a low complexity.
D Experimental Setup
We provide additional details on the datasets used and their
respective models from our experiments.
Iris [Dua and Graff, 2017]: The iris dataset consists of
150 datapoints: 50 per class and 4 features per datapoint.
We use a one layer multilayer perceptron trained to 96%
accuracy as our f.
Adult [Dua and Graff, 2017]: Each of the 48842 data-
points has 38 features and falls in one of two classes.
Note we label encode categorical attributes. We use a
one layer MLP (40 hidden nodes with leaky-relu activa-
tion) trained to an accuracy of 82%.
Mimic-III [Johnson et al., 2016]: The MIMIC-III (Med-
ical Information Mart for Intensive Care III) is a large
electronic health record dataset compromised of health
related data of over 40,000 patients who were admit-
ted to the the critical care units of Beth Israel Dea-
coness Medical Center between the years 2001 and
2012. MIMIC-III consists of demographics, vital sign
measurements, lab test results, medications, procedures,
caregiver notes, imaging reports, and mortality of the
ICU patients. Using MIMIC-III dataset, we extracted
seventeen real-valued features deemed critical in the
sepsis diagnosis task as per [Purushotham et al., 2018].
These are the processed features we extracted for ev-
ery sepsis diagnosis (a binary variable indicating the
presence of sepsis): Glasgow Coma Scale, Systolic
Blood Pressure, Heart Rate, Body Temperature, Pao2 /
Fio2 ratio, Urine Output, Serum Urea Nitrogen Level,
White Blood Cells Count, Serum Bicarbonate Level,
Sodium Level, Potassium Level, Bilirubin Level, Age,
Acquired Immunodeficiency Syndrome, Hematologic
Malignancy, Metastatic Cancer, Admission Type. We
used two layers of 16 hidden nodes each and leaky-relu
activation to get an accuracy of 91% on the sepsis pre-
diction task.
MNIST with CNN [LeCun et al., 1998]: We use a CNN
trained to 90% accuracy with the following architecture:
one layer with 32 5 ×5filters and ReLU activation; max
pooling layer with a 2×2filter and stride of 2; convo-
lutional layer with 64 5 ×5filters and ReLU activation;
max pooling layer with a 2×2filter and stride of 2;
final dense layer with 10 output neurons. We used the
MNIST dataset with 60,000 28x28 grayscale images of
the 10 digits, along with a test set of 10,000 images.
Note that we fix a dataset-model pairing for all experiments.
In practice, when calculating average sensitivity, we use the
following formulation:
µA(f,g,x) = 1
|Nr|X
z∈Nr
D(g(f,x),g(f,z))
ρ(x,z)
Effectively, we want to ensure that the distance between an
explanation of xand an explanation of z, a point in the neigh-
borhood of x, is proportional to the distance between xand
z. Some recent work has shown that average sensitivity can
be lowered with simple smoothing tricks to explanation func-
tions or with adversarial training of the predictor itself.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.