Code
I enjoy designing and building efficient and easy-to-use open-source tools:
-
ranx: a Python library for evaluating and comparing Information Retrieval and Recommender Systems built on top of Numba for high-speed vector operations and automatic parallelization. ranx was featured in ECIR 2022 and CIKM 2022.
-
ranxhub: ranx’s companion repository for sharing pre-computed runs for Information Retrieval datasets, such as MSMARCO. ranxhub will be featured in SIGIR 2023.
-
retriv: a search engine implemented in Python supporting Sparse (traditional search with BM25, TF-IDF), Dense (semantic search) and Hybrid retrieval (a mix of Sparse and Dense Retrieval). It allows you to build a search engine in a single line of code.
-
indxr: a Python utility for indexing long files that allows reading specific lines dynamically, avoiding hogging your RAM. indxr can be particularly useful for managing large datasets by loading data dynamically and with a low memory footprint.
-
multipipe: a Python utility that allows creating pipelines of functions to execute on any given iterable (e.g., lists, generators) by leveraging multiprocessing.
-
unified-io: a Python utility that attempts to unify several I/O operations (i.e., read/write data in different formats) under a similar interface while making them more concise and user-friendly.
-
CDE-IR: A configuration-driven framework for reproducible Information Retrieval experiments based on PyTorch, PyTorch Lightning, Transformers and Hydra. This is a work in progress.