qgrep-mcp

Indexed code search MCP server + Claude Code plugin. Orders of magnitude faster than ripgrep on large codebases.

An amortized cost estimator decides at query time whether building a qgrep index is worth it, based on file count (which correlates r=0.96 with ripgrep latency). Works fully without qgrep installed. It's a pure enhancement over ripgrep.

Motivation

AI coding tools ship with ripgrep or similar linear-scan search. This works fine on small repos, but breaks down on large codebases:

Repository	Files	ripgrep (per search)	qgrep (per search)
home-assistant/core	24,718	~28s	~0.034s
rust-lang/rust	58,547	~60s	~0.034s
torvalds/linux	92,920	~92s	~0.161s

Each search blocks the agent's reasoning until it returns. Even with async execution, ripgrep saturates disk I/O scanning the same files repeatedly. An indexed search returns in milliseconds regardless of repo size.

Why not just fix it upstream? The models behind these coding tools are post-trained to use specific built-in tools like Grep and file search. Tool preferences get baked into the model weights during post-training, and system prompts reinforce them further by defining the available tool set. Users can't modify either. Even when an MCP tool like search_code is registered alongside built-in Grep, the model defaults to what it was post-trained on. We tested this directly: Claude Code ignores search_code 100% of the time when only the MCP server is present, with no steering mechanism.

This project bridges that gap by working at the layer users can control: hooks intercept tool calls before they execute, skills inject context that nudges model behavior at inference time, and agents constrain tool access so indexed search is the only option. No post-training needed, no system prompt changes, no waiting for upstream fixes.

Benchmarks

Tested on three real-world repos with cold disk cache (OS file cache cleared between runs, simulating a fresh session where the AI agent hasn't touched these files yet). This reflects real-world conditions since agents start fresh sessions and the OS evicts cached file data over time:

Repository	Files	Avg ripgrep	Avg qgrep	Speedup	Index build
home-assistant/core	24,718	27.6s	0.034s	812x	93s
rust-lang/rust	58,547	59.6s	0.034s	1,753x	83s
torvalds/linux	92,920	92.4s	0.161s	574x	236s

Detailed results

rust-lang/rust (58,547 files):

Query	ripgrep	qgrep	Speedup
`TODO\|FIXME`	59.65s	0.055s	1,085x
`fn main`	59.66s	0.027s	2,210x
`unsafe impl`	59.40s	0.018s	3,300x
`fn\s+\w+\(.*Result<`	36.85s	0.037s	990x
`pub\s+(unsafe\s+)?fn\s+\w+`	34.89s	0.041s	861x
`#\[derive\(.Clone.\)\]`	33.83s	0.027s	1,242x

Linux kernel (92,920 files):

Query	ripgrep	qgrep	Speedup
`TODO\|FIXME`	65.04s	0.312s	208x
`int main`	107.97s	0.074s	1,459x
`static void`	104.30s	0.098s	1,064x
`static\s+const\s+struct\s+file_operations`	49.85s	0.258s	193x
`pr_err\(\|pr_warn\(\|pr_info\(`	47.66s	0.225s	212x
`MODULE_LICENSE\(`	51.55s	0.215s	240x

home-assistant/core (24,718 files):

Query	ripgrep	qgrep	Speedup
`TODO\|FIXME`	36.53s	0.043s	850x
`async def`	22.96s	0.036s	638x
`class.*:`	23.30s	0.024s	971x
`async\s+def\s+async_setup_entry`	23.19s	0.027s	867x
`raise\s+HomeAssistantError`	3.43s	0.026s	132x
`CONF_\w+\s=\s"`	1.27s	0.017s	74x

What determines search speed?

File count is the dominant factor. Across our three benchmark repos (25k, 58k, 93k files), ripgrep latency scales nearly linearly with file count.

Installation

Five options, from most to least automated. Pick the one that fits your workflow:

Option 1: Hook + MCP Server (hard redirect)

The fully automatic route. The hook intercepts every Grep call and redirects to search_code when it detects a large codebase. Claude doesn't need to be nudged or told anything; the hook handles it transparently.

git clone https://github.com/sumisingh10/qgrep-mcp.git
claude --plugin-dir ./qgrep-mcp

The skill and agent are also included but have no effect when the hook is active. They're harmless to leave in place.

How the hooks work:

A PreToolUse hook intercepts Grep before it runs and decides whether to redirect:

< 5k files → Grep runs as normal (ripgrep is fast enough even on cold cache)
5k-15k files → allows first 2 Grep calls to collect latency baselines, then redirects if slow
> 15k files → redirects immediately to search_code MCP tool

A PostToolUse hook runs after Grep completes and records the observed latency into the shared stats file. This feeds the estimator real timing data for smarter gray zone decisions without waiting for MCP tool calls.

On first redirect, search_code auto-builds a qgrep index, then all subsequent searches use it.

Option 2: MCP Server + CLAUDE.md (manual nudge)

Register the MCP server and add a line to your CLAUDE.md telling the model to prefer search_code over built-in Grep. No plugin, no hook, no skill files needed.

pip install -e ./qgrep-mcp
claude mcp add qgrep-mcp -- python -m qgrep_mcp

Then add to your project's CLAUDE.md:

When searching code, prefer the `search_code` MCP tool over built-in Grep. It uses an indexed backend that is orders of magnitude faster on large codebases.

This works because CLAUDE.md is loaded into context at the start of every session. The same approach works with AGENTS.md (Codex), .cursor/rules/*.mdc (Cursor), or .github/copilot-instructions.md (Copilot).

Option 3: Skill + MCP Server (soft nudge)

Loads the skill and MCP server but no hook. When Claude's task involves searching code ("find in files", "grep for", "search the codebase", etc.), the skill activates and nudges Claude toward search_code. Built-in Grep is not intercepted, so Claude may still use it for simple searches.

Zero overhead when the skill isn't triggered: only metadata (~100 words) is always loaded, the full body is injected only when relevant.

The skill prompt lives at skills/code-search/SKILL.md. You can customize the trigger phrases or tool guidance there.

git clone https://github.com/sumisingh10/qgrep-mcp.git
claude --plugin-dir ./qgrep-mcp

The hook and agent are also included but unused by this option. They're harmless to leave in place.

Option 4: Agent + MCP Server (delegated search)

Loads the agent and MCP server. Claude can spawn the code-search agent for search-heavy tasks. The agent only has access to search_code, build_search_index, search_estimate, Read, and Glob (no built-in Grep), so it always uses indexed search.

Useful for exploratory tasks across large codebases where you want search delegated to a subagent that runs multiple indexed queries in parallel.

The agent definition lives at agents/code-search.md. You can adjust the tool list, model, or instructions there.

git clone https://github.com/sumisingh10/qgrep-mcp.git
claude --plugin-dir ./qgrep-mcp

The hook and skill are also included but unused by this option. They're harmless to leave in place.

Option 5: MCP Server only (not recommended)

Just the raw MCP tools with no steering. The model will not use these on its own. It always prefers built-in Grep over MCP tools. This option only works if you explicitly ask for search_code in every prompt.

pip install -e ./qgrep-mcp
claude mcp add qgrep-mcp -- python -m qgrep_mcp

Why not standalone? We tested this across multiple sessions. Even with the MCP server registered, Claude defaults to built-in Grep 100% of the time. You need at least one steering mechanism to make indexed search actually get used.

Prerequisites

ripgrep → usually already available (Claude Code bundles it)

qgrep → optional but recommended for the speed gains. Install from releases:

# macOS
curl -sL https://github.com/zeux/qgrep/releases/download/v1.5/qgrep-macos.zip -o /tmp/qgrep.zip
unzip -o /tmp/qgrep.zip -d /tmp && chmod +x /tmp/qgrep && sudo mv /tmp/qgrep /usr/local/bin/

MCP Server tools

Tool	Description
`search_code`	Fast indexed code search with auto-selected backend
`build_search_index`	Manage index lifecycle (build/rebuild/status/delete)
`search_estimate`	Get indexing recommendation + stats for a directory

The estimator handles backend selection:

Small repos (< 5k files): always ripgrep (fast enough even on cold cache)
Large repos (> 15k files): build index immediately, use qgrep
Gray zone (5k-15k): collects latency baselines over the first 2 searches, indexes if ripgrep is consistently slow
Features qgrep can't handle (context lines, glob filters): always use ripgrep

Indexes are maintained automatically. Stale indexes (where source files have been modified since the last build) are detected and rebuilt before searching. On server startup, previously-indexed repos are scanned and any stale indexes are rebuilt in the background so the first search of a new session hits a fresh index.

Using with other AI coding tools

The MCP server is the portable core. The hook, skill, and agent are Claude Code-specific steering mechanisms, but the server itself works with any MCP-compatible client.

Layer	Claude-specific?	Portable?
MCP Server (`search_code`, etc.)	No	Any MCP client
PreToolUse hook (`hooks/intercept_grep.py`)	Yes	No
PostToolUse hook (`hooks/record_grep_latency.py`)	Yes	No
Skill (`skills/code-search/SKILL.md`)	Yes (Claude plugin)	No
Agent (`agents/code-search.md`)	Yes (Claude plugin)	No

Note: Unlike Claude Code, most other tools don't have a built-in grep that takes priority over MCP tools. The MCP server alone may work fine without needing a hook or skill to steer the tool toward it.

Cross-tool concepts

Each AI coding tool has its own version of instruction files and MCP configuration:

Concept	Claude Code	Codex CLI	Cursor	Copilot (VS Code)
Instruction file	`CLAUDE.md`	`AGENTS.md`	`.cursor/rules/*.mdc`	`.github/copilot-instructions.md`
MCP config	`.mcp.json`	`~/.codex/config.toml`	Settings UI	`.vscode/mcp.json`
Skills/nudges	`skills/*/SKILL.md`	`.agents/skills/*/SKILL.md`	Rules (glob-triggered)	N/A
Custom agents	`agents/*.md`	N/A	N/A	N/A

OpenAI Codex CLI

pip install -e ./qgrep-mcp

Add to ~/.codex/config.toml:

[mcp_servers.qgrep-mcp]
command = "python3"
args = ["-m", "qgrep_mcp"]

Codex also supports skills in the same directory structure. You can adapt the skill prompt from skills/code-search/SKILL.md into .agents/skills/qgrep-search/SKILL.md in your project to nudge Codex toward search_code.

Cursor

Add to .cursor/mcp.json in your project:

{
  "qgrep-mcp": {
    "command": "python3",
    "args": ["-m", "qgrep_mcp"]
  }
}

You can also create a .cursor/rules/qgrep-search.mdc rule to nudge Cursor toward the MCP tool. Adapt the prompt from skills/code-search/SKILL.md.

GitHub Copilot (VS Code)

Add to .vscode/mcp.json:

{
  "servers": {
    "qgrep-mcp": {
      "command": "python3",
      "args": ["-m", "qgrep_mcp"]
    }
  }
}

Any MCP-compatible client

Install the package, then point your client at the stdio server:

pip install -e ./qgrep-mcp
python3 -m qgrep_mcp

All tools listed above require pip install -e ./qgrep-mcp first so python3 -m qgrep_mcp resolves without needing a PYTHONPATH override.

Running tests

pip install -e ".[dev]"
pytest tests/ -v

39 tests covering the estimator, search orchestrator, index management, warming, and hook logic.

Future: Semantic search

The current system handles exact pattern matching (regex). A natural extension is semantic search — finding code by meaning rather than exact text. For example, searching for "authentication middleware" would find functions like verify_jwt_token() or check_session_cookie() even though those strings don't contain the search terms.

This could work by embedding code chunks at index build time (using a local model like nomic-embed-text or an API), storing vectors alongside the qgrep index, and adding a semantic_search tool that queries the vector store. The estimator would route between regex search and semantic search based on the query type.

This is not currently implemented but is a logical next step for the project.

Acknowledgments

Thanks to Derek Feriancek and Michael Sklar for helping me work through the core problem: retraining models to prefer new tools is not feasible for everyone, and users can't modify system prompts. A runtime harness that sits between the model and its built-in tools is one abstracted solution to that.