Browsing¶
Overview¶
The browsing package provides two complementary layers:
Manual sources: lightweight, stateless classes to perform provider-specific searches and return normalized SearchItem results.
Agent tools: functions decorated for the Agents SDK that wrap manual sources (or the arXiv parser) and return JSON-serializable dictionaries for tool calling.
Exports¶
The top-level agent.browsing exposes a curated set of utilities:
ArxivBrowser: high-level arXiv helper built on shared.arxiv_parser
arxiv_search_tool, arxiv_get_paper_tool: arXiv tools
web_search_tool: DuckDuckGo web search tool
google_scholar_search_tool, pubmed_search_tool, github_repo_search_tool: manual-source backed agent tools
Manual Sources¶
- class agent.browsing.manual.sources.base.ManualSource(*args, **kwargs)[source]¶
Bases:
Protocol
Protocol for manual browsing sources.
Implementations should be stateless or manage their own lightweight state.
- iter_all(query, chunk_size=100, limit=None, **kwargs)[source]¶
Iterate over results by fetching in chunks.
- search(query, max_results=25, start=0, **kwargs)[source]¶
Return a single page of results for a query.
- class agent.browsing.manual.sources.base.SearchItem(title, url, snippet=None, item_id=None, extra=None)[source]¶
Bases:
object
Lightweight search result item for manual browsing.
- Variables:
title – Human-readable title of the item.
url – Canonical URL for the item.
snippet – Optional short snippet or summary.
item_id – Optional stable identifier when available, e.g., a PubMed ID.
extra – Optional provider-specific metadata.
- Parameters:
- agent.browsing.manual.sources.base.paginate_results(results, limit)[source]¶
Yield up to a limit of results from an iterable.
- Parameters:
results (
Iterable
[SearchItem
]) – Iterable of search items to paginate.limit (
Optional
[int
]) – Optional maximum number of items to yield.
- Return type:
- Returns:
Iterator yielding up to
limit
items.
Google Scholar manual browsing via DuckDuckGo site-restricted search.
This module intentionally avoids scraping Google Scholar directly. It leverages
the ddgs (or legacy duckduckgo_search
as a fallback) package to retrieve
public result snippets limited to the Scholar domain.
- class agent.browsing.manual.sources.google_scholar.GoogleScholarBrowser(*args, **kwargs)[source]¶
Bases:
ManualSource
Manual source for Google Scholar using site-restricted web search.
Note: result metadata is limited to title, URL, and snippet.
- iter_all(query, chunk_size=100, limit=None, *, region='wt-wt', **kwargs)[source]¶
Iterate through Scholar results by fetching in chunks.
- Parameters:
- Return type:
- Returns:
Iterator over normalized search items.
- search(query, max_results=25, start=0, *, region='wt-wt', **kwargs)[source]¶
Search Scholar results using DuckDuckGo site restriction.
PubMed manual browsing using NCBI E-utilities (ESearch + ESummary).
No additional dependencies required. Network calls use requests
and return
lightweight SearchItem
objects with stable PubMed IDs.
- class agent.browsing.manual.sources.pubmed.PubMedBrowser(*args, **kwargs)[source]¶
Bases:
ManualSource
Manual source for PubMed articles using E-utilities JSON endpoints.
- iter_all(query, chunk_size=100, limit=None, **kwargs)[source]¶
Iterate through PubMed results by fetching in chunks.
- search(query, max_results=25, start=0, **kwargs)[source]¶
Search PubMed and return a page of results.
This uses
esearch.fcgi
to obtain a list of PMIDs, thenesummary.fcgi
to fetch basic metadata.
GitHub manual browsing using the public Search API.
Respects the GITHUB_TOKEN
environment variable if present to increase rate
limits. Returns repository-level results ordered by stars.
- class agent.browsing.manual.sources.github.GitHubRepoBrowser(*args, **kwargs)[source]¶
Bases:
ManualSource
Manual source for GitHub repository search.
- iter_all(query, chunk_size=100, limit=None, **kwargs)[source]¶
Iterate through repository search results by fetching in chunks.
- search(query, max_results=25, start=0, **kwargs)[source]¶
Search repositories by query, sorted by stars in descending order.
Pagination is mapped from
start
andmax_results
to GitHub’spage
andper_page
parameters.- Parameters:
- Return type:
- Returns:
List of normalized repository items.
ArXiv Manual Browser¶
Manual browsing utilities for arXiv.
Provides a simple ArxivBrowser
class that accepts search queries and returns
results in a convenient, strongly-typed form using the shared arXiv parser.
- class agent.browsing.manual.manual.ArxivBrowser(downloads_dir='downloads')[source]¶
Bases:
object
High-level wrapper for performing arXiv searches.
The browser exposes simple methods to: - Fetch a page of results for a query - Iterate over all results for a query in chunks - Retrieve a single paper by arXiv ID
Example:
from agent.browsing.manual import ArxivBrowser browser = ArxivBrowser() page = browser.search("transformers AND speech", max_results=5) print([p.title for p in page])
- Parameters:
downloads_dir (str)
- get(arxiv_id)[source]¶
Retrieve a single paper by arXiv ID.
- Parameters:
arxiv_id (
str
) – The arXiv identifier (with or without version suffix).- Return type:
- Returns:
The corresponding instance if found; otherwise
None
.
- iter_all(query, categories=None, date_from_days=None, chunk_size=100, limit=None)[source]¶
Iterate over all results for a query by fetching in chunks.
- Parameters:
query (
str
) – Free-text search query.categories (
Optional
[List
[str
]]) – Optional list of arXiv category filters.date_from_days (
Optional
[int
]) – If provided, limit results to within the lastN
days.chunk_size (
int
) – Number of results fetched per request.limit (
Optional
[int
]) – If provided, stop after yielding at mostlimit
results.
- Yields:
ArxivPaper
instances one by one, until exhausted orlimit
reached.- Return type:
- search(query, max_results=25, start=0, categories=None, date_from_days=None)[source]¶
Search arXiv and return a single page of results.
- Parameters:
query (
str
) – Free-text search query.max_results (
int
) – Maximum number of results to return in this page.start (
int
) – Pagination start index (0-based) across the full result set.categories (
Optional
[List
[str
]]) – Optional list of arXiv category filters (e.g.,["cs.AI"]
).date_from_days (
Optional
[int
]) – If provided, limit results to within the lastN
days.
- Return type:
- Returns:
A page of results.
- search_all(query, categories=None, date_from_days=None, chunk_size=100, limit=None)[source]¶
Collect results for a query into a list by consuming the iterator.
- Parameters:
query (
str
) – Free-text search query.categories (
Optional
[List
[str
]]) – Optional list of arXiv category filters.date_from_days (
Optional
[int
]) – If provided, limit results to within the lastN
days.chunk_size (
int
) – Number of results fetched per request.limit (
Optional
[int
]) – If provided, stop after collecting at mostlimit
results.
- Return type:
- Returns:
Collected results list.