External Methods for Explainable AI

At the STOR-i conference this year, there was a talk on “Optimising Explainable AI”. Broadly, it looked at ways to make an AI (by which I mean a classifier of some sort) more transparent so that you can see why it makes the decisions it’s making.

The central concept was this: any AI will have a big long explanation as to why it did what it did (maybe in the form of a polynomial or large matrix of weights) that is in most cases too long for a human to understand. You need the explanation to be shorter (while still being an explanation). The way you do this is to impose some kind of “penalty” related to the size of the explanation, and hope that you can get a tradeoff between an AI that classifies well and an AI with short explanations. Example penalties include sparsity constraints in generalised linear models (such as Lasso regression) and restrictions on the allowable depth of decision trees.

However, these are internal methods. They are things you consider when you are designing your AI. Suppose you weren’t allowed to do that – maybe you absolutely *must* get the best classification performance possible and weren’t allowed to factor in things like explainability, or maybe the internal structure of your classifier is just not suited to making its explanations shorter via any method you’ve found so far. What do you do then?

How To Train Your Classifier

A trained classifier will take an input, for example the image below, and output one of a preset number of categories to which it belongs, for instance either “dog” or “cat”. In actually, classifiers give a score, which is a number between 0 and 1 saying “how doggy vs catty is this picture”. All pictures above a threshold are classified as dogs and those below as cats – setting that threshold is the job of the person who trains the model, and is based on a tradeoff for the error in misclassification either way.

Markus, who is a very good boi. Source: my fellow STOR-i student Tessa

However, the score here is what’s important. The trained classifier can be thought of as a function f:{Space of all possible images of correct resolution} -> [0, 1], without caring about its internal workings at all. It’s even (usually) a continuous function for some kind of notion of “continuity”. And what do we do in OR when we find an interesting continous function?

We optimise it, of course!

Now, the good thing here is we’re not really looking for global optima. We’re not particularly interested in constructing the “doggiest dog that ever did dog”[1], and thank goodness we aren’t, because that image space is extremely high-dimensional (number of dimensions = number of pixels). Instead, we can look at what our optimiser does when we give it something fun from our test set as a starting point.

“Here is a dog – please make it more of a cat”. “Here is a cat – please make it EVEN MORE of a cat”. “Here is a cat that you misclassified as a dog – please make it more cat”[2]. By looking at what the optimiser does in response to this, you can learn what features the classifier is considering “important” in relation to that particular image. Does it make the ears rounder when asked to go more doglike, for instance?

Turns out you don’t even need an optimiser – just by using a high-dimensional gradient estimator, you can create a map of what pixels the classifier considers “locally important” in an image. There’s an old urban myth about a classifier for American vs Soviet tanks that was actually classifying the background light levels as cloudy vs sunny due to the days on which the training set photos were taken, and therefore failed spectacularly in live use[3]. Such a classifier problem is easy to spot if you can tell that the classifier is mainly looking at the backgrounds of an image rather than the features it is supposed to be identifying.

An EU tank (that is neither American nor Soviet)

By using an external method such as this to provide explainability to every decision an AI makes, we avoid the tradeoffs internal methods force us to make. External methods add rather than compromise.[4]

[1] Although I might be interested in the “cattiest cat that ever did cat”.

[2] Did I mention I like cats?

[3] Almost certainly a false myth; see https://www.gwern.net/Tanks for the story. It’s probably stuck around because it’s both such a useful explanatory device for elementary machine learning and it lets us poke fun at the incompetence of military science & technology.

[4] Ribiero et. al (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier https://arxiv.org/abs/1602.04938 is a brilliant resource for diving into this further.