Ranking Module¶

BM25 ranking for candidate papers.

Implements a compact BM25 ranking using only Python standard library to keep dependencies minimal. Tokenization is a simple lowercased split on non-word characters, which is sufficient for baseline ranking.

agent.pipeline.ranking.rank_candidates(*, query, candidates, top_k)[source]¶

Rank candidates with BM25 over title + summary and return top-k.

Parameters:

query (str) – Natural-language query.
candidates (Iterable[PaperCandidate]) – Iterable of PaperCandidate to be ranked. Candidates are copied to a list internally and scores are written to their bm25_score attribute.
top_k (int) – Number of items to return after sorting by score and recency.

Return type:

List[PaperCandidate]

Returns:

The top-k candidates, sorted by descending score and recency.