Oracle design as de-black-boxer.

post by Stuart_Armstrong · 2016-09-02T13:38:07.000Z · LW · GW · 0 comments

This paper introduces an interpreter for learning algorithms, for the purpose of clarifying what is happening inside the algorithm.

The interpreter, Local Interpretable Model-agnostic Explanations (LIME), gives the human user some idea of the important factors going into the learning algorithm's decision, such as:

We can use the first Oracle design here for similar purposes, though for giving clarity. The "Counterfactually Unread Agent" can already be used to see what values of a specific random variable will maximise a certain utility function. We could also search across a whole slew of random variables, to see which one is most important in maximising the given utility function, giving outputs like this:

0 comments

Comments sorted by top scores.