Explainability has emerged as a critical AI research objective, but the breadth of proposed methods and application domains suggest that criteria for explanation vary greatly. In particular, what counts as a good explanation, and what kinds of explanation are computationally feasible, has become trickier in light of oqaque “black box” systems such as deep neural networks. Explanation in such cases has drifted from what many philosophers stipulated as having to involve deductive and causal principles to mere “interpretation,” which approximates what happened in the target system to varying degrees. However, such post hoc constructed rationalizations are highly problematic for social robots that operate interactively in spaces shared with humans. For in such social contexts, explanations of behavior, and, in particular, justifications for violations of expected behavior, should make reference to socially accepted principles and norms.
In this article, we show how a social robot’s actions can face explanatory demands for how it came to act on its decision, what goals, tasks, or purposes its design had those actions pursue and what norms or social constraints the system recognizes in the course of its action. As a result, we argue that explanations for social robots will need to be accurate representations of the system’s operation along causal, purposive, and justificatory lines. These explanations will need to generate appropriate references to principles and norms—explanations based on mere “interpretability” will ultimately fail to connect the robot’s behaviors to its appropriate determinants. We then lay out the foundations for a cognitive robotic architecture for HRI, together with particular component algorithms, for generating explanations and engaging in justificatory dialogues with human interactants. Such explanations track the robot’s
actual
decision-making and behavior, which themselves are determined by normative principles the robot can describe and use for justifications.