Home
๐ฅ News

๐ [July 27, 2022]
ranx
will be featured in CIKM 2022, the 31st ACM International Conference on Information and Knowledge Management! 
[August 29, 2022]
ranx
0.2.9
is out.
Filetypes are now automatically inferred from file extensions (.json
โjson
,.trec
โtrec
,.txt
โtrec
). Default behavior can be overridden with thekind
parameter (this should allow for backward compatibility).
Twosided Paired Student's tTest
is now the default statistical test used when callingcompare
(it is much faster thanFisher's
and they usually agree).
Loading / savingQrels
andRun
from / tojson
files is now much faster thanks to orjson.  [June 29, 2022] Added support for Tukey's HSD Test.
 [June 28, 2022] Added support for Bpref and Rankbiased Precision (RBP) metrics.
 [June 9, 2022] Added support for 25 fusion algorithms, six normalization strategies, and an automatic fusion optimization functionality in
v.0.2
.
Check out the official documentation and Jupyter Notebook for further details on fusion and normalization.
โก๏ธ Introduction
ranx is a library of fast ranking evaluation metrics implemented in Python, leveraging Numba for highspeed vector operations and automatic parallelization. It offers a userfriendly interface to evaluate and compare Information Retrieval and Recommender Systems. ranx allows you to perform statistical tests and export LaTeX tables for your scientific publications. Moreover, ranx provides several fusion algorithms and normalization strategies, and an automatic fusion optimization functionality. ranx was featured in ECIR 2022, the 44th European Conference on Information Retrieval.
If you use ranx to evaluate results or conducting experiments involving fusion for your scientific publication, please consider citing it.
For a quick overview, follow the Usage section.
For a indepth overview, follow the Examples section.
โจ Features
Metrics
 Hits
 Hit Rate
 Precision
 Recall
 F1
 rPrecision
 Bpref
 Rankbiased Precision (RBP)
 Mean Reciprocal Rank (MRR)
 Mean Average Precision (MAP)
 Normalized Discounted Cumulative Gain (NDCG)
The metrics have been tested against TREC Eval for correctness.
Statistical Tests
Please, refer to Smucker et al., Carterette, and Fuhr for additional information on statistical tests for Information Retrieval.
Offtheshelf Qrels
You can load qrels from irdatasets as simply as:
qrels = Qrels.from_ir_datasets("msmarcodocument/dev")
Fusion Algorithms
Name  Name  Name  Name  Name 

CombMIN  CombMNZ  RRF  MAPFuse  BordaFuse 
CombMED  CombGMNZ  RBC  PosFuse  Weighted BordaFuse 
CombANZ  ISR  WMNZ  ProbFuse  Condorcet 
CombMAX  Log_ISR  Mixed  SegFuse  Weighted Condorcet 
CombSUM  LogN_ISR  BayesFuse  SlideFuse  Weighted Sum 
Please, refer to the documentation for further details.
Normalization Strategies
Please, refer to the documentation for further details.
๐ Installation
pip install ranx
๐ก Usage
Create Qrels and Run
from ranx import Qrels, Run
qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },
"q_2": { "d_11": 6, "d_22": 1 } }
run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_32": 0.5, "d_35": 0.4 },
"q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_22": 0.5, "d_35": 0.4 } }
qrels = Qrels(qrels_dict)
run = Run(run_dict)
Evaluate
from ranx import evaluate
# Compute score for a single metric
evaluate(qrels, run, "ndcg@5")
>>> 0.7861
# Compute scores for multiple metrics at once
evaluate(qrels, run, ["map@5", "mrr"])
>>> {"map@5": 0.6416, "mrr": 0.75}
Compare
from ranx import compare
# Compare different runs and perform Twosided Paired Student's tTest
report = compare(
qrels=qrels,
runs=[run_1, run_2, run_3, run_4, run_5],
metrics=["map@100", "mrr@100", "ndcg@10"],
max_p=0.01 # Pvalue threshold
)
print(report)
# Model MAP@100 MRR@100 NDCG@10
    
a model_1 0.320แต 0.320แต 0.368แตแถ
b model_2 0.233 0.234 0.239
c model_3 0.308แต 0.309แต 0.330แต
d model_4 0.366แตแตแถ 0.367แตแตแถ 0.408แตแตแถ
e model_5 0.405แตแตแถแต 0.406แตแตแถแต 0.451แตแตแถแต
Fusion
from ranx import fuse, optimize_fusion
best_params = optimize_fusion(
qrels=train_qrels,
runs=[train_run_1, train_run_2, train_run_3],
norm="minmax", # The norm. to apply before fusion
method="wsum", # The fusion algorithm to use (Weighted Sum)
metric="ndcg@100", # The metric to maximize
)
combined_test_run = fuse(
runs=[test_run_1, test_run_2, test_run_3],
norm="minmax",
method="wsum",
params=best_params,
)
๐ Examples
Name  Link 

Overview  
Qrels and Run  
Evaluation  
Comparison and Report  
Fusion 
๐ Documentation
Browse the documentation for more details and examples.
๐ Citation
If you use ranx to evaluate results for your scientific publication, please consider citing it:
@inproceedings{bassani2022ranx,
author = {Elias Bassani},
title = {ranx: {A} BlazingFast Python Library for Ranking Evaluation and Comparison},
booktitle = {{ECIR} {(2)}},
series = {Lecture Notes in Computer Science},
volume = {13186},
pages = {259264},
publisher = {Springer},
year = {2022}
}
๐ Feature Requests
Would you like to see other features implemented? Please, open a feature request.
๐ค Want to contribute?
Would you like to contribute? Please, drop me an email.
๐ License
ranx is an opensourced software licensed under the MIT license.