CRMRR (Cluster Robust Mean Reciprocal Rank) is a novel, rank-based metric for the evaluation of embedding-based Knowledge Graph Completion models.
Installation • Quickstart • Implementation • Experiments • Citation Contributing •
The requirements for this repository can be installed from GitHub using with:
pip install -r https://raw.githubusercontent.com/simoncharmms/crmrr/requirements.txt
For this demonstration, the python library PyKeen is required. We reccomend to install the latest version of PyKEEN installed directly from the source code on GitHub with:
pip install git+https://github.com/pykeen/pykeen.git
and then downgrade to the version we used, which is pykeen==1.10.1
.
This example shows how to train three different Knowledge Graph (KG) embedding models on a dataset and test the Cluster Robust Mean Reciprocal Rank (CRMRR) against other rank-based metrics.
The following example trains and evaluates the embedding models ComplEx, TransR and DistMult using the PyKeen library.
Different from the PyKeen default settings, we use the stochastic local closed world assumption (sLCWA) training approach. Furthermore, we evaluate the embeddings with PyKeens rank-based evaluation, which includes the Mean Rank (MR), the Mean Reciprocal Rank (MRR) and Hits @ K (whereby K = 10), before we compute the CRMRR.
The experiments can be reproduced using the following commands:
from main import main
result = main(
models = ['TransR', 'DistMult', 'ComplEx']
)
For this quickstart, standard settings as described in src/constants.py
apply.
The results are returned and stored as a dataframe result
.
Additionally, plots are produced, unless you set PLOTS = False
in src/constants.py
Below are the implementation logic of CRMRR as well as some notes on models, datasets, training modes, evaluators, and metrics implemented
in pykeen
.
In embedding-based KG Completion (KGC), the restriction of the model to learning completion candidates from existing triplets in the graph is a problem under the Open World Assumption.
Rank-based metric for KGC do not reflect this problem and thus convey only the likelihood of a true completion candidate given the existing triples.
We propose a novel rank-based metric that reflects this circumstance based on the assumption of cluster robust inference.
Let
where
The penalty term of the CRMRR shows beneficial behavior for the choice of the most realistic embedding model under the assumption of cluster robust errors, penalizing less for the case of few clusters and stronger the higher the given rank.
The implementation of the CRMRR can be found in src/cluster_robust_mean_reciprocal_rank.py
.
The Leiden algorithm assigns nodes to arbitrary communities and then optimizes an objective function
where
The src/cluster_graphs.py
file contains all methods required for graph clustering.
The following 3 embedding models are used for testing CRMRR.
Name | Model | Interaction | Citation |
---|---|---|---|
ComplEx | pykeen.models.ComplEx |
pykeen.nn.ComplExInteraction |
Trouillon et al., 2016 |
DistMult | pykeen.models.DistMult |
pykeen.nn.DistMultInteraction |
Yang et al., 2014 |
TransR | pykeen.models.TransR |
pykeen.nn.TransRInteraction |
Lin et al., 2015 |
The following metrics are used as a benchmark for CRMRR.
Name | Interval | Direction | Description | Type |
---|---|---|---|---|
Hits @ K | [0, 1] | 📈 | The relative frequency of ranks not larger than a given k. | Ranking |
Mean Rank (MR) | [1, ∞) | 📉 | The arithmetic mean over all ranks. | Ranking |
Mean Reciprocal Rank (MRR) | (0, 1] | 📈 | The inverse of the harmonic mean over all ranks. | Ranking |
Cluster Robust Mean Reciprocal Rank (MRR) | (-∞, ∞) | 📈 | The inverse of the harmonic mean over all ranks. | Ranking |
Note that reproducing the full expirments from the paper can take a considerable amount of training time.
Therefore, the in constants.py
you can specify whether you want to sample the data and to what extent:
# Indicate whether to sample the data or not.
SAMPLE = True
# Specify sample ratio.
SAMPLE_RATIO = 0.001
Furthermore, we provide some pre-trained embeddings in models
.
The following datasets were retrieved from SNAP
and selected to match optimal coverage of different sizes, degrees and clustering behaviour.
Name | Documentation | Citation | Relations | Nodes |
---|---|---|---|---|
web-google | Pages and directed edges represent hyperlinks between them. |
J. Leskovec et al., 2009 | 875 kk | 5,105 k |
web-amazon | Amazon reviews up to 2013. |
McAuley and Leskovec, 2013 | 273 k | 925 k |
sx-stackoverflow | Interactions on the stack exchange web site Stack Overflow. |
Paranjape et al., 2017 | 482 k | 634 k |
musae-facebook | Facebook pages and links between them. |
Rozemberczki et al., 2019 | 27 k | 171 k |
Industry KG | Automotive vehicle names and sales countries. |
Schramm, 2023 | 5 k | 116 k |
BTC alpha | Trust network of people who trade using Bitcoin on a platform called Bitcoin Alpha. |
Kumar et al., 2018 | 2 k | 3 k |
BTC otc | Trust network of people who trade using Bitcoin on a platform called Bitcoin OTC. |
Kumar et al., 2018 | 5 k | 3 k |
ATTENTION: Please use the links above to fetch web-google and sx-stackoverflow, those datasets are not pushed to GitHub due to file size.
The benchmark datasets cover a variety of KGs and differ in node count, relation count, mean degree, transitivity and clustering coefficient.
The src/get_benchmark_datasets.py
file contains all methods required for data loading.
For the experiments, the following stopper, optimizer and evaluator were chosen:
Reference | Description | |
---|---|---|
Stopper | early |
A harness for early stopping. |
Optimizer | Adam |
A rank-based evaluator for KGE models. |
Evaluator | rankbased |
A rank-based evaluator for KGE models. |
Experiments on the benchmark datasets yielded the following results:
-
$CRMRR$ outperformed$MR$ ,$MRR$ and$H_{10}$ in 71% of the cases. -
$CRMRR$ tends to perform well on large \acp{kg}. - The exploration of other \ac{kgc} methods, such as Tensor Factorization or Rule Based approaches are promising, due to the model-agnostic properties of
$CRMRR$ .
We implemented a benchmarking study which is described in our article. If you find CRMRR useful, consider citing our work as follows:
@InProceedings{10.1007/978-3-031-40283-8_25,
author={Schramm, Simon and Niklas, Ulrich and Schmid, Ute},
editor= {Jin, Zhi and Jiang, Yuncheng and Buchmann, Robert Andrei and Bi, Yaxin and Ghiran, Ana-Maria and Ma, Wenjun},
title= {Cluster Robust Inference for Embedding-Based Knowledge Graph Completion},
booktitle= {Knowledge Science, Engineering and Management},
year= {2023},
publisher= {Springer Nature Switzerland},
address= {Cham},
pages= {284--299},
isbn= {978-3-031-40283-8}
}
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See my website for more information on potential research cooperations.
This project has been supported by the BMW Group.