Browsing¶

Overview¶

The browsing package provides two complementary layers:

Manual sources: lightweight, stateless classes to perform provider-specific searches and return normalized SearchItem results.
Agent tools: functions decorated for the Agents SDK that wrap manual sources (or the arXiv parser) and return JSON-serializable dictionaries for tool calling.

Exports¶

The top-level agent.browsing exposes a curated set of utilities:

ArxivBrowser: high-level arXiv helper built on shared.arxiv_parser
arxiv_search_tool, arxiv_get_paper_tool: arXiv tools
web_search_tool: DuckDuckGo web search tool
google_scholar_search_tool, pubmed_search_tool, github_repo_search_tool: manual-source backed agent tools

Manual Sources¶

class agent.browsing.manual.sources.base.ManualSource(*args, **kwargs)[source]¶

Bases: Protocol

Protocol for manual browsing sources.

Implementations should be stateless or manage their own lightweight state.

iter_all(query, chunk_size=100, limit=None, **kwargs)[source]¶

Iterate over results by fetching in chunks.

Parameters:

query (str) – Free-text query.
chunk_size (int) – Number of items to fetch per request.
limit (Optional[int]) – Optional maximum number of items to yield.
kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over search items.

search(query, max_results=25, start=0, **kwargs)[source]¶

Return a single page of results for a query.

Parameters:

query (str) – Free-text query.
max_results (int) – Maximum number of results to return.
start (int) – Zero-based start index for pagination across results.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of search items for the requested page.

search_all(query, chunk_size=100, limit=None, **kwargs)[source]¶

Collect results for a query into a list by consuming the iterator.

Parameters:

query (str) – Free-text query.
chunk_size (int) – Number of items to fetch per request.
limit (Optional[int]) – Optional maximum number of items to collect.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of collected search items.

class agent.browsing.manual.sources.base.SearchItem(title, url, snippet=None, item_id=None, extra=None)[source]¶

Bases: object

Lightweight search result item for manual browsing.

Variables:

title – Human-readable title of the item.
url – Canonical URL for the item.
snippet – Optional short snippet or summary.
item_id – Optional stable identifier when available, e.g., a PubMed ID.
extra – Optional provider-specific metadata.

Parameters:

title (str)
url (str)
snippet (str | None)
item_id (str | None)
extra (dict | None)

extra: Optional[dict] = None¶

item_id: Optional[str] = None¶

snippet: Optional[str] = None¶

title: str¶

url: str¶

agent.browsing.manual.sources.base.paginate_results(results, limit)[source]¶

Yield up to a limit of results from an iterable.

Parameters:

results (Iterable[SearchItem]) – Iterable of search items to paginate.
limit (Optional[int]) – Optional maximum number of items to yield.

Return type:

Iterator[SearchItem]

Returns:

Iterator yielding up to limit items.

Google Scholar manual browsing via DuckDuckGo site-restricted search.

This module intentionally avoids scraping Google Scholar directly. It leverages the ddgs (or legacy duckduckgo_search as a fallback) package to retrieve public result snippets limited to the Scholar domain.

class agent.browsing.manual.sources.google_scholar.GoogleScholarBrowser(*args, **kwargs)[source]¶

Bases: ManualSource

Manual source for Google Scholar using site-restricted web search.

Note: result metadata is limited to title, URL, and snippet.

iter_all(query, chunk_size=100, limit=None, *, region='wt-wt', **kwargs)[source]¶

Iterate through Scholar results by fetching in chunks.

Parameters:

query (str) – Free-text query string.
chunk_size (int) – Number of results fetched per request.
limit (Optional[int]) – Optional maximum number of items to yield.
region (str) – Region code for DuckDuckGo.
kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over normalized search items.

search(query, max_results=25, start=0, *, region='wt-wt', **kwargs)[source]¶

Search Scholar results using DuckDuckGo site restriction.

Parameters:

query (str) – Free-text query string.
max_results (int) – Maximum number of results to return.
start (int) – Zero-based start index; applied client-side.
region (str) – Region code for DuckDuckGo.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items.

search_all(query, chunk_size=100, limit=None, *, region='wt-wt', **kwargs)[source]¶

Collect Scholar results for a query into a list.

Parameters:

query (str) – Free-text query string.
chunk_size (int) – Number of results fetched per request.
limit (Optional[int]) – Optional maximum number of items to collect.
region (str) – Region code for DuckDuckGo.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items.

PubMed manual browsing using NCBI E-utilities (ESearch + ESummary).

No additional dependencies required. Network calls use requests and return lightweight SearchItem objects with stable PubMed IDs.

class agent.browsing.manual.sources.pubmed.PubMedBrowser(*args, **kwargs)[source]¶

Bases: ManualSource

Manual source for PubMed articles using E-utilities JSON endpoints.

iter_all(query, chunk_size=100, limit=None, **kwargs)[source]¶

Iterate through PubMed results by fetching in chunks.

Parameters:

query (str) – Free-text query string.
chunk_size (int) – Number of results per request.
limit (Optional[int]) – Optional maximum number of items to yield.
kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over normalized search items.

search(query, max_results=25, start=0, **kwargs)[source]¶

Search PubMed and return a page of results.

This uses esearch.fcgi to obtain a list of PMIDs, then esummary.fcgi to fetch basic metadata.

Parameters:

query (str) – Free-text query string.
max_results (int) – Maximum number of results to return.
start (int) – Zero-based start index for pagination.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items with PMIDs.

search_all(query, chunk_size=100, limit=None, **kwargs)[source]¶

Collect PubMed results for a query into a list.

Parameters:

query (str) – Free-text query string.
chunk_size (int) – Number of results per request.
limit (Optional[int]) – Optional maximum number of items to collect.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items.

GitHub manual browsing using the public Search API.

Respects the GITHUB_TOKEN environment variable if present to increase rate limits. Returns repository-level results ordered by stars.

class agent.browsing.manual.sources.github.GitHubRepoBrowser(*args, **kwargs)[source]¶

Bases: ManualSource

Manual source for GitHub repository search.

api_url: str = 'https://api.github.com/search/repositories'¶

iter_all(query, chunk_size=100, limit=None, **kwargs)[source]¶

Iterate through repository search results by fetching in chunks.

Parameters:

query (str) – Free-text search query.
chunk_size (int) – Number of repositories per request.
limit (Optional[int]) – Optional maximum number of items to yield.
kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over normalized repository items.

search(query, max_results=25, start=0, **kwargs)[source]¶

Search repositories by query, sorted by stars in descending order.

Pagination is mapped from start and max_results to GitHub’s page and per_page parameters.

Parameters:

query (str) – Free-text search query, supports qualifiers (e.g., language:Python).
max_results (int) – Maximum number of repositories to return.
start (int) – Zero-based start index across the result stream.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized repository items.

search_all(query, chunk_size=100, limit=None, **kwargs)[source]¶

Collect repository search results for a query into a list.

Parameters:

query (str) – Free-text search query.
chunk_size (int) – Number of repositories per request.
limit (Optional[int]) – Optional maximum number of items to collect.
kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized repository items.

ArXiv Manual Browser¶

Manual browsing utilities for arXiv.

Provides a simple ArxivBrowser class that accepts search queries and returns results in a convenient, strongly-typed form using the shared arXiv parser.

class agent.browsing.manual.manual.ArxivBrowser(downloads_dir='downloads')[source]¶

Bases: object

High-level wrapper for performing arXiv searches.

The browser exposes simple methods to: - Fetch a page of results for a query - Iterate over all results for a query in chunks - Retrieve a single paper by arXiv ID

Example:

from agent.browsing.manual import ArxivBrowser

browser = ArxivBrowser()
page = browser.search("transformers AND speech", max_results=5)
print([p.title for p in page])

Parameters:: downloads_dir (str)

get(arxiv_id)[source]¶

Retrieve a single paper by arXiv ID.

Parameters:: arxiv_id (str) – The arXiv identifier (with or without version suffix).
Return type:: Optional[ArxivPaper]
Returns:: The corresponding instance if found; otherwise None.

iter_all(query, categories=None, date_from_days=None, chunk_size=100, limit=None)[source]¶

Iterate over all results for a query by fetching in chunks.

Parameters:

query (str) – Free-text search query.
categories (Optional[List[str]]) – Optional list of arXiv category filters.
date_from_days (Optional[int]) – If provided, limit results to within the last N days.
chunk_size (int) – Number of results fetched per request.
limit (Optional[int]) – If provided, stop after yielding at most limit results.

Yields:

ArxivPaper instances one by one, until exhausted or limit reached.

Return type:

Iterator[ArxivPaper]

search(query, max_results=25, start=0, categories=None, date_from_days=None)[source]¶

Search arXiv and return a single page of results.

Parameters:

query (str) – Free-text search query.
max_results (int) – Maximum number of results to return in this page.
start (int) – Pagination start index (0-based) across the full result set.
categories (Optional[List[str]]) – Optional list of arXiv category filters (e.g., ["cs.AI"]).
date_from_days (Optional[int]) – If provided, limit results to within the last N days.

Return type:

List[ArxivPaper]

Returns:

A page of results.

search_all(query, categories=None, date_from_days=None, chunk_size=100, limit=None)[source]¶

Collect results for a query into a list by consuming the iterator.

Parameters:

query (str) – Free-text search query.
categories (Optional[List[str]]) – Optional list of arXiv category filters.
date_from_days (Optional[int]) – If provided, limit results to within the last N days.
chunk_size (int) – Number of results fetched per request.
limit (Optional[int]) – If provided, stop after collecting at most limit results.

Return type:

List[ArxivPaper]

Returns:

Collected results list.

Browsing¶

Overview¶

Exports¶

Manual Sources¶

ArXiv Manual Browser¶

Agent Tools¶