Ranking Module¶
BM25 ranking for candidate papers.
Implements a compact BM25 ranking using only Python standard library to keep dependencies minimal. Tokenization is a simple lowercased split on non-word characters, which is sufficient for baseline ranking.
- agent.pipeline.ranking.rank_candidates(*, query, candidates, top_k)[source]¶
Rank candidates with BM25 over title + summary and return top-k.
- Parameters:
query (
str
) – Natural-language query.candidates (
Iterable
[PaperCandidate
]) – Iterable ofPaperCandidate
to be ranked. Candidates are copied to a list internally and scores are written to theirbm25_score
attribute.top_k (
int
) – Number of items to return after sorting by score and recency.
- Return type:
- Returns:
The top-k candidates, sorted by descending score and recency.