Search Module¶
Search utilities for the pipeline.
This module provides: - Query generation (simple heuristic without embeddings) - Retrieval from multiple sources (arXiv, Google Scholar, PubMed, GitHub)
All functions are synchronous wrappers around sync parsers to keep things simple for initial integration. The pipeline orchestrator can run them in threads or plain sync for now.
- agent.pipeline.search.arxiv_search(*, query, categories=None, max_results=100, start=0)[source]¶
Search arXiv and convert results to
PaperCandidate
items.- Parameters:
- Return type:
- Returns:
A list of candidate papers converted from arXiv results.
Example:
items = arxiv_search(query="RAG AND small datasets", max_results=10) print(len(items))
- agent.pipeline.search.collect_candidates(task, queries, per_query_limit=50)[source]¶
Run source-specific search per query and collect unique candidates.
- Parameters:
task (
PipelineTask
) – The pipeline task providing categories and other context.queries (
Iterable
[GeneratedQuery
]) – Iterable ofGeneratedQuery
with per-query source.per_query_limit (
int
) – Max results retrieved for each query (default 50).
- Return type:
- Returns:
Unique candidates from all queries.
- agent.pipeline.search.github_search(*, query, max_results=50, start=0)[source]¶
Search GitHub repositories and represent them as candidates.
The pipeline treats repositories as candidates with title and snippet.
- Parameters:
- Return type:
- Returns:
Candidate list with repo name and link.
- agent.pipeline.search.pubmed_search(*, query, max_results=50, start=0)[source]¶
Search PubMed and convert results to candidates.
- Parameters:
- Return type:
- Returns:
Candidate list with title and PubMed link.
- agent.pipeline.search.scholar_search(*, query, max_results=50, start=0)[source]¶
Search Google Scholar and convert results to lightweight candidates.
Since Scholar results do not provide abstracts, the
summary
field uses the snippet text when available. Categories and arXiv-specific fields are left empty.- Parameters:
- Return type:
- Returns:
Candidate list with title and link.