Experiment replication and reproduction are central to the concept of empirical evaluation of Recommender Systems (RS), which is still an open issue in the area of recommendation. When an experiment is repeated by a different researcher and exactly the same result is obtained, we can say the experiment has been replicated. When the results are not exactly the same but the conclusions are compatible with the prior ones, we have a reproduction of the experiment.
Offline evaluation of recommender systems requires an implementation of the algorithm or technique to be evaluated, a set of performance measures for comparative evaluation, and an experimental protocol establishing how to handle data and compute metrics in detail. Online evaluation similarly requires an algorithm implementation and a population of users to survey (by means of an A/B test, for instance). In this context, as in offline evaluation but perhaps more importantly, an experimental protocol also needs to be established and followed.
Even when a set of publicly available resources (data and algorithm implementations) exists in the community, very often research studies do not report comparable results for the same methods “under the same conditions”. This is due to the high number of experimental design parameters in recommender system evaluation, and the huge impact of the experimental design on the outcomes.
In order to seek reproducibility and replication several strategies can be considered, such as source code sharing, standardization of agreed evaluation metrics and protocols, or releasing public experimental design software, all of which have difficulties of their own. Similarly, for online evaluation, an extensive analysis of the population of test users should be provided. While the problem of reproducibility and replication has been recognized in the community, the need for a solution remains largely unmet. This, together with the need for further discussion, methodological standardization in both reproducibility as well as replication motivates the workshop.