Browsing

Overview

The browsing package provides two complementary layers:

  • Manual sources: lightweight, stateless classes to perform provider-specific searches and return normalized SearchItem results.

  • Agent tools: functions decorated for the Agents SDK that wrap manual sources (or the arXiv parser) and return JSON-serializable dictionaries for tool calling.

Exports

The top-level agent.browsing exposes a curated set of utilities:

  • ArxivBrowser: high-level arXiv helper built on shared.arxiv_parser

  • arxiv_search_tool, arxiv_get_paper_tool: arXiv tools

  • web_search_tool: DuckDuckGo web search tool

  • google_scholar_search_tool, pubmed_search_tool, github_repo_search_tool: manual-source backed agent tools

Manual Sources

class agent.browsing.manual.sources.base.ManualSource(*args, **kwargs)[source]

Bases: Protocol

Protocol for manual browsing sources.

Implementations should be stateless or manage their own lightweight state.

iter_all(query, chunk_size=100, limit=None, **kwargs)[source]

Iterate over results by fetching in chunks.

Parameters:
  • query (str) – Free-text query.

  • chunk_size (int) – Number of items to fetch per request.

  • limit (Optional[int]) – Optional maximum number of items to yield.

  • kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over search items.

search(query, max_results=25, start=0, **kwargs)[source]

Return a single page of results for a query.

Parameters:
  • query (str) – Free-text query.

  • max_results (int) – Maximum number of results to return.

  • start (int) – Zero-based start index for pagination across results.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of search items for the requested page.

search_all(query, chunk_size=100, limit=None, **kwargs)[source]

Collect results for a query into a list by consuming the iterator.

Parameters:
  • query (str) – Free-text query.

  • chunk_size (int) – Number of items to fetch per request.

  • limit (Optional[int]) – Optional maximum number of items to collect.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of collected search items.

class agent.browsing.manual.sources.base.SearchItem(title, url, snippet=None, item_id=None, extra=None)[source]

Bases: object

Lightweight search result item for manual browsing.

Variables:
  • title – Human-readable title of the item.

  • url – Canonical URL for the item.

  • snippet – Optional short snippet or summary.

  • item_id – Optional stable identifier when available, e.g., a PubMed ID.

  • extra – Optional provider-specific metadata.

Parameters:
  • title (str)

  • url (str)

  • snippet (str | None)

  • item_id (str | None)

  • extra (dict | None)

extra: Optional[dict] = None
item_id: Optional[str] = None
snippet: Optional[str] = None
title: str
url: str
agent.browsing.manual.sources.base.paginate_results(results, limit)[source]

Yield up to a limit of results from an iterable.

Parameters:
Return type:

Iterator[SearchItem]

Returns:

Iterator yielding up to limit items.

Google Scholar manual browsing via DuckDuckGo site-restricted search.

This module intentionally avoids scraping Google Scholar directly. It leverages the ddgs (or legacy duckduckgo_search as a fallback) package to retrieve public result snippets limited to the Scholar domain.

class agent.browsing.manual.sources.google_scholar.GoogleScholarBrowser(*args, **kwargs)[source]

Bases: ManualSource

Manual source for Google Scholar using site-restricted web search.

Note: result metadata is limited to title, URL, and snippet.

iter_all(query, chunk_size=100, limit=None, *, region='wt-wt', **kwargs)[source]

Iterate through Scholar results by fetching in chunks.

Parameters:
  • query (str) – Free-text query string.

  • chunk_size (int) – Number of results fetched per request.

  • limit (Optional[int]) – Optional maximum number of items to yield.

  • region (str) – Region code for DuckDuckGo.

  • kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over normalized search items.

search(query, max_results=25, start=0, *, region='wt-wt', **kwargs)[source]

Search Scholar results using DuckDuckGo site restriction.

Parameters:
  • query (str) – Free-text query string.

  • max_results (int) – Maximum number of results to return.

  • start (int) – Zero-based start index; applied client-side.

  • region (str) – Region code for DuckDuckGo.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items.

search_all(query, chunk_size=100, limit=None, *, region='wt-wt', **kwargs)[source]

Collect Scholar results for a query into a list.

Parameters:
  • query (str) – Free-text query string.

  • chunk_size (int) – Number of results fetched per request.

  • limit (Optional[int]) – Optional maximum number of items to collect.

  • region (str) – Region code for DuckDuckGo.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items.

PubMed manual browsing using NCBI E-utilities (ESearch + ESummary).

No additional dependencies required. Network calls use requests and return lightweight SearchItem objects with stable PubMed IDs.

class agent.browsing.manual.sources.pubmed.PubMedBrowser(*args, **kwargs)[source]

Bases: ManualSource

Manual source for PubMed articles using E-utilities JSON endpoints.

iter_all(query, chunk_size=100, limit=None, **kwargs)[source]

Iterate through PubMed results by fetching in chunks.

Parameters:
  • query (str) – Free-text query string.

  • chunk_size (int) – Number of results per request.

  • limit (Optional[int]) – Optional maximum number of items to yield.

  • kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over normalized search items.

search(query, max_results=25, start=0, **kwargs)[source]

Search PubMed and return a page of results.

This uses esearch.fcgi to obtain a list of PMIDs, then esummary.fcgi to fetch basic metadata.

Parameters:
  • query (str) – Free-text query string.

  • max_results (int) – Maximum number of results to return.

  • start (int) – Zero-based start index for pagination.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items with PMIDs.

search_all(query, chunk_size=100, limit=None, **kwargs)[source]

Collect PubMed results for a query into a list.

Parameters:
  • query (str) – Free-text query string.

  • chunk_size (int) – Number of results per request.

  • limit (Optional[int]) – Optional maximum number of items to collect.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized search items.

GitHub manual browsing using the public Search API.

Respects the GITHUB_TOKEN environment variable if present to increase rate limits. Returns repository-level results ordered by stars.

class agent.browsing.manual.sources.github.GitHubRepoBrowser(*args, **kwargs)[source]

Bases: ManualSource

Manual source for GitHub repository search.

api_url: str = 'https://api.github.com/search/repositories'
iter_all(query, chunk_size=100, limit=None, **kwargs)[source]

Iterate through repository search results by fetching in chunks.

Parameters:
  • query (str) – Free-text search query.

  • chunk_size (int) – Number of repositories per request.

  • limit (Optional[int]) – Optional maximum number of items to yield.

  • kwargs (object)

Return type:

Iterator[SearchItem]

Returns:

Iterator over normalized repository items.

search(query, max_results=25, start=0, **kwargs)[source]

Search repositories by query, sorted by stars in descending order.

Pagination is mapped from start and max_results to GitHub’s page and per_page parameters.

Parameters:
  • query (str) – Free-text search query, supports qualifiers (e.g., language:Python).

  • max_results (int) – Maximum number of repositories to return.

  • start (int) – Zero-based start index across the result stream.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized repository items.

search_all(query, chunk_size=100, limit=None, **kwargs)[source]

Collect repository search results for a query into a list.

Parameters:
  • query (str) – Free-text search query.

  • chunk_size (int) – Number of repositories per request.

  • limit (Optional[int]) – Optional maximum number of items to collect.

  • kwargs (object)

Return type:

List[SearchItem]

Returns:

List of normalized repository items.

ArXiv Manual Browser

Manual browsing utilities for arXiv.

Provides a simple ArxivBrowser class that accepts search queries and returns results in a convenient, strongly-typed form using the shared arXiv parser.

class agent.browsing.manual.manual.ArxivBrowser(downloads_dir='downloads')[source]

Bases: object

High-level wrapper for performing arXiv searches.

The browser exposes simple methods to: - Fetch a page of results for a query - Iterate over all results for a query in chunks - Retrieve a single paper by arXiv ID

Example:

from agent.browsing.manual import ArxivBrowser

browser = ArxivBrowser()
page = browser.search("transformers AND speech", max_results=5)
print([p.title for p in page])
Parameters:

downloads_dir (str)

get(arxiv_id)[source]

Retrieve a single paper by arXiv ID.

Parameters:

arxiv_id (str) – The arXiv identifier (with or without version suffix).

Return type:

Optional[ArxivPaper]

Returns:

The corresponding instance if found; otherwise None.

iter_all(query, categories=None, date_from_days=None, chunk_size=100, limit=None)[source]

Iterate over all results for a query by fetching in chunks.

Parameters:
  • query (str) – Free-text search query.

  • categories (Optional[List[str]]) – Optional list of arXiv category filters.

  • date_from_days (Optional[int]) – If provided, limit results to within the last N days.

  • chunk_size (int) – Number of results fetched per request.

  • limit (Optional[int]) – If provided, stop after yielding at most limit results.

Yields:

ArxivPaper instances one by one, until exhausted or limit reached.

Return type:

Iterator[ArxivPaper]

search(query, max_results=25, start=0, categories=None, date_from_days=None)[source]

Search arXiv and return a single page of results.

Parameters:
  • query (str) – Free-text search query.

  • max_results (int) – Maximum number of results to return in this page.

  • start (int) – Pagination start index (0-based) across the full result set.

  • categories (Optional[List[str]]) – Optional list of arXiv category filters (e.g., ["cs.AI"]).

  • date_from_days (Optional[int]) – If provided, limit results to within the last N days.

Return type:

List[ArxivPaper]

Returns:

A page of results.

search_all(query, categories=None, date_from_days=None, chunk_size=100, limit=None)[source]

Collect results for a query into a list by consuming the iterator.

Parameters:
  • query (str) – Free-text search query.

  • categories (Optional[List[str]]) – Optional list of arXiv category filters.

  • date_from_days (Optional[int]) – If provided, limit results to within the last N days.

  • chunk_size (int) – Number of results fetched per request.

  • limit (Optional[int]) – If provided, stop after collecting at most limit results.

Return type:

List[ArxivPaper]

Returns:

Collected results list.

Agent Tools