pi-search-hub

Unified web search + content extraction extension for pi with 12 backend providers (all working). One web_search tool, one web_read tool, auto-fallback, RRF-ranked combine mode, and credential resolution via env/shell/literal.

Installation

pi install npm:pi-search-hub

Note for DuckDuckGo backend: Requires the ddgs Python package. Install with:

Linux/macOS: pip3 install ddgs

Windows: pip install ddgs

Usage

Web Search

After installing, just ask naturally:

Search for recent AI agent frameworks.

What's the latest news on Llama 4?

Or use the tools directly — the agent picks the best configured backend automatically:

web_search — search the web with auto-fallback or parallel combine mode
web_read — fetch any URL as clean markdown

Combine Mode

Set combine=true to query ALL enabled backends in parallel with Reciprocal Rank Fusion (RRF) ranking:

Search for "Rust vs Go performance benchmarks" with combine=true to get results from all backends

Combine mode benefits:

Broader coverage across multiple search indexes
Results ranked by RRF — position-based scoring across all backends
Each result shows which backend found it
URL deduplication with content-aware merge (prefers richest result)
Useful for comprehensive research or when you want diverse sources

Tradeoff: Uses more API quota per query (all backends are called), but you get more comprehensive results.

Read Web Pages

Fetch any URL as clean markdown — great for extracting article content, docs, or reference pages:

Read https://docs.example.com/api-reference

The web_read tool supports:

objective — specific question to focus extraction
keywords — relevant terms to highlight on long pages
mode — rush for speed (return innerText) or smart (markdown extraction)
fresh — bypass cache when freshness matters

Supported Backends

#	Backend	Free Tier	API Key?	How to get key
1	DuckDuckGo	Unlimited (rate-limited)	No	`pip install ddgs` (Linux/macOS: `pip3`)
2	Jina AI	Free tier (API key req.)	Yes	jina.ai
3	Marginalia Search	Unlimited (rate-limited)	No†	marginalia.nu
4	Tavily	1,000 calls/month	Yes	tavily.com
5	Serper (Google)	2,500 queries/month	Yes	serper.dev
6	Brave	2,000 queries/month	Yes	brave.com/search/api
7	Firecrawl	500 free credits	Yes	firecrawl.dev
8	Exa	10 QPS rate-limited	Yes	exa.ai
9	LangSearch	Genuinely free, no CC	Yes	langsearch.com
10	WebSearchAPI.ai	2,000 free credits	Yes	websearchapi.ai
11	Perplexity Sonar	Unlimited free queries	Yes	perplexity.ai
12	SearXNG	Self-hosted, unlimited	No	docs.searxng.org

† Marginalia Search uses public as a shared API key — no registration required, but subject to a shared rate limit.

Jina AI (s.jina.ai) returns full markdown content. Free tier requires a free API key from jina.ai.

SearXNG is a self-hosted metasearch engine. Run your own instance (or use a public one), no API key required. Configure the instance URL in .pi/search.json.

Removed: Stract, UnSearch, BoardReader, EntireWeb, Search1API, FreeAPITools.dev — no longer viable (public API removed, requires payment, or endpoint not implemented).

Configuration

Configure backends globally (all projects) or per-project:

Global: ~/.pi/agent/extensions/search.json
Project: .pi/search.json (project takes precedence)

{
  "defaultBackend": "auto",
  "backends": {
    "duckduckgo": { "enabled": true },
    "jina":       { "enabled": true, "apiKey": "JINA_API_KEY" },
    "marginalia": { "enabled": true },
    "serper":     { "enabled": true, "apiKey": "SERPER_API_KEY" },
    "tavily":     { "enabled": true, "apiKey": "TAVILY_API_KEY" },
    "brave":      { "enabled": true, "apiKey": "BRAVE_API_KEY" },
    "exa":        { "enabled": true, "apiKey": "EXA_API_KEY" },
    "firecrawl":  { "enabled": true, "apiKey": "FIRECRAWL_API_KEY" },
    "langsearch": { "enabled": true, "apiKey": "LANGSEARCH_API_KEY" },
    "websearchapi":{ "enabled": true, "apiKey": "WEBSEARCHAPI_API_KEY" },
    "perplexity": { "enabled": true, "apiKey": "PERPLEXITY_API_KEY" },
    "searxng":    { "enabled": true, "instanceUrl": "http://localhost:8888" }
  }
}

Credential Resolution

The apiKey field supports four formats (following pi-web-providers convention):

`apiKey` value	Resolved from	Example
`"SERPER_API_KEY"`	`process.env.SERPER_API_KEY`	ALL_CAPS → env var
`"!pass show api/serper"`	stdout of shell command (cached)	`!` prefix → exec
`"sk-abc123..."`	Used as-is	Literal key (backwards compatible)
(unset)	`SEARCH_<BACKEND>_API_KEY` env fallback	Auto-enables backend

Env var references: Any ALL_CAPS string is treated as an environment variable name (not a literal). If the referenced env var is unset, a warning is printed (your literal key is not silently discarded).

Shell commands: Commands prefixed with ! are executed via execSync with a 5s timeout. Results are cached and invalidated when config is reloaded (editing the config file clears the cache).

Convenience env vars: Backends are auto-enabled when these env vars are set (even with no config entry):

export SEARCH_SERPER_API_KEY="sk-..."
export SEARCH_TAVILY_API_KEY="sk-..."
export SEARCH_EXA_API_KEY="sk-..."
# ...

{
  "backends": {
    "serper": { "enabled": true, "apiKey": "SERPER_API_KEY" }
  }
}

To rotate a shell-command key: Update the secret in your password manager, then trigger a config reload (edit the config file, or wait 10s for automatic refresh).

Or use the interactive setup:

/search-setup

Commands

Command	Description
`/search-setup`	Interactive prompt to configure API keys for any backend
`/search-status`	Show which backends are active and which have keys

How auto mode works

Fallback Mode (default, `combine=false`)

Tries each enabled backend in order from your config
If a backend fails (rate limit, auth error, etc.), moves to the next one
DuckDuckGo requires no API key; Jina AI needs a free API key. Both serve as safety nets
Returns results from the first backend that succeeds
If all backends fail, reports the collected errors

Combine Mode (`combine=true`)

Queries ALL enabled backends in parallel
Each backend receives numResults / numBackends as a target
Results are merged using Reciprocal Rank Fusion (RRF) — position-based scoring that works across incompatible ranking systems
Each result shows its source backend (e.g., *Source: Tavily*)
URL dedup prefers the result with the richest content (content > snippet)
Backend statistics are displayed (which succeeded, result counts, errors)

RRF Scoring

RRF assigns each result a score of Σ(1 / (60 + rank_i)) across all backends that returned it. Results are ranked by score, then by number of backends that found them. This means a result ranked #1 by one backend and #5 by another beats a result ranked #4 by two backends.

Security

API keys are stored in local config files only (~/.pi/agent/extensions/search.json or .pi/search.json), never sent to any third party besides the chosen backend
Env vars and shell commands are supported for credential resolution — the config file is trusted (you own it), but never commit plain API keys to version control
DuckDuckGo queries use spawned Python subprocess (abortable via signal)
All HTTP backends have a 30-second timeout; shell commands for credentials have a 5-second timeout
Error messages are sanitized — API response bodies are truncated and key-like patterns are redacted
The .pi/ directory is in .gitignore — never commit API keys to version control

Testing

# Run the full benchmark against all backends
node benchmark/benchmark.mjs

# Quick test Jina AI (with your free API key)
curl -s -H "Authorization: Bearer $JINA_API_KEY" "https://s.jina.ai/?q=test&format=json" | jq .

# Quick test via curl with your configured key
curl -X POST "https://api.exa.ai/search" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $KEY" \
  -d '{"query": "test", "numResults": 3, "contents": {"text": true}}'

# Quick test Perplexity Sonar
curl -X POST "https://api.perplexity.ai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KEY" \
  -d '{"model": "sonar", "messages": [{"role": "user", "content": "test"}], "search_context_size": "low"}'

# Quick test SearXNG (replace URL with your instance)
curl "http://localhost:8888/search?q=test&format=json&count=3"

Adding a new backend

Backends are registered via the BACKEND_DEFS registry in extensions/search-hub.ts. Define a search function and add one entry to the registry:

const BACKEND_DEFS: Record<string, BackendRunner> = {
  // ... existing entries
  mybackend: {
    needsKey: true,
    needsKeyFromConfig: false,
    needsInstanceUrl: false,
    label: "My Backend",
    setupLabel: "My Backend (free tier description)",
    search: async (query, numResults, { key, signal }) => {
      const result = await searchMyBackend(query, numResults, key!, signal);
      return { results: result.results };
    },
  },
};

The registry handles dispatching, key resolution, formatting labels, and setup menu — no other edits needed.

License

MIT

Proudly created with pi