A Closer-to-Reality Model for Comparing Relevant Dimensions of Recommender Systems, with Application to Novelty
Providing fair and convenient comparisons between recommendation algorithms—where algorithms could focus on a traditional dimension (accuracy) and/or less traditional ones (e.g., novelty, diversity, serendipity, etc.)—is a key challenge in the recent developments of recommender systems. This paper focuses on novelty and presents a new, closer-to-reality model for evaluating the quality of a recommendation algorithm by reducing the popularity bias inherent in traditional training/test set evaluation frameworks, which are biased by the dominance of popular items and their inherent features. In the suggested model, each interaction has a probability of being included in the test set that randomly depends on a specific feature related to the focused dimension (novelty in this work). The goal of this paper is to reconcile, in terms of evaluation (and therefore comparison), the accuracy and novelty dimensions of recommendation algorithms, leading to a more realistic comparison of their performance. The results obtained from two well-known datasets show the evolution of the behavior of state-of-the-art ranking algorithms when novelty is progressively, and fairly, given more importance in the evaluation procedure, and could lead to potential changes in the decision processes of organizations involving recommender systems.