Code
I enjoy designing and building efficient and easy-to-use open-source tools:
-
ranx: a Python library for evaluating and comparing Information Retrieval and Recommender Systems built on top of Numba for high-speed vector operations and automatic parallelization. ranx was featured in ECIR 2022 and CIKM 2022.
-
ranxhub: ranx’s companion repository for sharing pre-computed runs for Information Retrieval datasets, such as MSMARCO. ranxhub will be featured in SIGIR 2023.
-
retriv: a search engine implemented in Python supporting Sparse (traditional search with BM25, TF-IDF), Dense (semantic search) and Hybrid retrieval (a mix of Sparse and Dense Retrieval). It allows you to build a search engine in a single line of code.
-
indxr: a Python utility for indexing long files that allows reading specific lines dynamically, avoiding hogging your RAM. indxr can be particularly useful for managing large datasets by loading data dynamically and with a low memory footprint.
-
GuardBench: a Python library for guardrail models evaluation. It provides a common interface to 40 evaluation datasets. It also allows to quickly compare results and export LaTeX tables for scientific publications. GuardBench’s benchmarking pipeline can also be leveraged on custom datasets.
-
multipipe: a Python utility that allows creating pipelines of functions to execute on any given iterable (e.g., lists, generators) by leveraging multiprocessing.
-
unified-io: a Python utility that attempts to unify several I/O operations (i.e., read/write data in different formats) under a similar interface while making them more concise and user-friendly.