Ranking Module

BM25 ranking for candidate papers.

Implements a compact BM25 ranking using only Python standard library to keep dependencies minimal. Tokenization is a simple lowercased split on non-word characters, which is sufficient for baseline ranking.

agent.pipeline.ranking.rank_candidates(*, query, candidates, top_k)[source]

Rank candidates with BM25 over title + summary and return top-k.

Parameters:
  • query (str) – Natural-language query.

  • candidates (Iterable[PaperCandidate]) – Iterable of PaperCandidate to be ranked. Candidates are copied to a list internally and scores are written to their bm25_score attribute.

  • top_k (int) – Number of items to return after sorting by score and recency.

Return type:

List[PaperCandidate]

Returns:

The top-k candidates, sorted by descending score and recency.