Run
Run
stores the relevance scores estimated by the model under evaluation.
There is no constraint on the score values, i.e., zero and negative scores are not removed.
The preferred way for creating a Run
instance is converting a Python dictionary as follows:
from ranx import Run
run_dict = {
"q_1": {
"d_1": 1.5,
"d_2": 2.6,
},
"q_2": {
"d_3": 2.8,
"d_2": 1.2,
"d_5": 3.1,
},
}
run = Run(run_dict, name="bm25")
Runs
can also be loaded from TREC-style and JSON files, and from Pandas DataFrames.
Load from Files
Parse a run file into ranx.Run
.
Supported formats are JSON, TREC run, gzipped TREC run, and LZ4.
Correct import behavior is inferred from the file extension: .json
-> json
, .trec
-> trec
, .txt
-> trec
, .gz
-> trec
, .lz4
-> lz4
.
Use the argument kind
to override the default behavior.
Use the argument name
to set the name of the run. Default is None
.
run = Run.from_file("path/to/run.json") # JSON file
run = Run.from_file("path/to/run.trec") # TREC-Style file
run = Run.from_file("path/to/run.txt") # TREC-Style file with txt extension
run = Run.from_file("path/to/run.gz") # Gzipped TREC-Style file
run = Run.from_file("path/to/run.lz4") # lz4 file produced by saving a ranx.Run as lz4
run = Run.from_file("path/to/run.custom", kind="json") # Loaded as JSON file
Load from Pandas DataFrames
ranx
can load runs
from Pandas DataFrames.
The argument name
is used to set the name of the run. Default is None
.
from pandas import DataFrame
run_df = DataFrame.from_dict({
"q_id": [ "q_1", "q_1", "q_2", "q_2" ],
"doc_id": [ "d_12", "d_25", "d_11", "d_22" ],
"score": [ 0.5, 0.3, 0.6, 0.1 ],
})
run = Run.from_df(
df=run_df,
q_id_col="q_id",
doc_id_col="doc_id",
score_col="score",
name="my_run",
)
Load from Parquet files
ranx
can load runs
from Parquet files, even from remote sources.
You can control the behavior of the underlying pandas.read_parquet
function by passing additional arguments through the pd_kwargs
argument (see https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html).
The argument name
is used to set the name of the run. Default is None
.
run = Run.from_parquet(
path="/path/to/parquet/file""",
q_id_col="q_id",
doc_id_col="doc_id",
score_col="score",
pd_kwargs=None,
name="my_run",
)
Save
Write run
to path
as JSON file, TREC run, LZ4 file, or Parquet file.
File type is automatically inferred form the filename extension: .json
-> json
, .trec
-> trec
, .txt
-> trec
, and .lz4
-> lz4
, .parq
-> parquet
, .parquet
-> parquet
.
Use the kind
argument to override this behavior.
run.save("path/to/run.json") # Save as JSON file
run.save("path/to/run.trec") # Save as TREC-Style file
run.save("path/to/run.txt") # Save as TREC-Style file with txt extension
run.save("path/to/run.lz4") # Save as lz4 file
run.save("path/to/run.parq") # Save as Parquet file
run.save("path/to/run.parquet") # Save as Parquet file
run.save("path/to/run.custom", kind="json") # Save as JSON file
Make comparable
It adds empty results for queries missing from the run and removes those not appearing in qrels.
run.make_comparable(qrels)