scholar-agent
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 6 GitHub stars
Code Uyari
- network request — Outbound network request in mcp_server.py
Permissions Gecti
- Permissions — No dangerous permissions requested
This tool is a knowledge agent for LLMs that performs automated online research and academic paper analysis, storing the results locally so the AI becomes smarter over time.
Security Assessment
The overall risk is Medium. The repository contains no hardcoded secrets and does not request dangerous system permissions. However, the automated nature of this agent poses inherent risks. It actively makes outbound network requests to external search engines and academic APIs like arXiv and Semantic Scholar. Because it is designed to download, extract, and process external content (such as extracting figures from PDFs and source archives), it handles untrusted external data. Additionally, the tool requires local file system access to create Markdown knowledge cards and build BM25 indexes. Users should ensure it runs in a restricted environment, as processing maliciously crafted external documents could potentially lead to local vulnerabilities.
Quality Assessment
The project is actively maintained, with its most recent code push happening today. It uses a standard permissive MIT license and includes clear, comprehensive documentation. However, it currently suffers from very low community visibility. With only 6 GitHub stars, the codebase has not been broadly examined or battle-tested by the open-source community. Trust should currently be placed primarily in the active maintenance rather than widespread peer review.
Verdict
Use with caution: The MIT license and active maintenance are positives, but the lack of community oversight combined with active network requests and local processing of untrusted external data requires running it in an isolated environment.
Knowledge agent for LLMs — online research + local accumulation, gets smarter over time. MCP-ready.
Scholar Agent
General-purpose LLMs are often inaccurate and outdated in specialized domains. Scholar Agent combines online research + local knowledge accumulation into a sustainable knowledge flywheel, making your AI smarter in your domain over time. It also builds a human-readable knowledge base for quick learning. Integrates seamlessly with Claude Code and VS Code Copilot via MCP.
What It Does
Your question
│
▼
Online research (AI agent search + SearXNG + academic APIs)
│
▼
Structured synthesis (with citations, confidence, uncertainty)
│
▼
Local accumulation (Markdown knowledge cards + BM25 index)
│
▼
Next question: AI checks local first ── hit? ──► use directly, fast & accurate
│ miss
▼
Research again → accumulate → reindex ──► knowledge base keeps growing
Each round compounds. Knowledge cards have full lifecycle management: draft → reviewed → trusted → stale → deprecated.
Academic Research Pipeline
Scholar Agent includes a comprehensive academic paper research pipeline:
- Paper Search — Search papers from arXiv, DBLP, and Semantic Scholar. Filter by top conferences (CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, ACL, EMNLP, MICCAI)
- Smart Scoring — Four-dimensional scoring engine (relevance, recency, popularity, quality) ranks papers by your research interests
- Deep Analysis Notes — Auto-generate 20+ section Obsidian-style markdown notes with
<!-- LLM: -->placeholders for AI-assisted completion - Figure Extraction — Extract images from arXiv source archives and PDFs (via PyMuPDF)
- Daily Recommendations — Automated daily paper search, scoring, deduplication, and recommendation note generation
- Paper → Knowledge Card — Convert paper analyses into knowledge cards that feed back into the knowledge flywheel
- Keyword Auto-Linking — Scan notes for technical terms and create
[[wiki-links]]automatically
Quick Start
Use as a standalone project
# Clone and install
git clone https://github.com/zfy465914233/scholar-agent.git
cd scholar-agent
pip install -r requirements.txt
# Build the knowledge index
python scripts/local_index.py --output indexes/local/index.json
# (Optional) Start SearXNG for web research
docker compose up -d
MCP configs are pre-configured:
- Claude Code:
.mcp.jsonis ready.cdinto the project and start Claude Code. - VS Code Copilot:
.vscode/mcp.jsonis ready. Open the project, enable agent mode.
Embed into an existing project
cp -r scholar-agent/ your-project/scholar-agent/
cd your-project && python scholar-agent/setup_mcp.py
Auto-generates config. Knowledge lives in your project, not inside scholar-agent.
MCP Tools
Core Tools (always available)
| Tool | Description |
|---|---|
query_knowledge |
Search local knowledge base |
save_research |
Save structured research results as a knowledge card |
list_knowledge |
Browse all knowledge cards |
capture_answer |
Quick-capture a Q&A pair as a draft card |
ingest_source |
Ingest a URL or raw text into the knowledge base |
build_graph |
Generate an interactive knowledge graph (vis.js) |
Academic Tools (set LORE_ACADEMIC=1 to enable)
| Tool | Description |
|---|---|
search_papers |
Search arXiv + Semantic Scholar with 4-dim scoring |
search_conf_papers |
Search conference papers via DBLP + S2 enrichment |
analyze_paper |
Generate deep-analysis markdown notes (20+ sections) |
extract_paper_images |
Extract figures from arXiv source / PDF |
paper_to_card |
Convert paper analysis into a knowledge card |
daily_recommend |
Daily paper recommendation workflow |
link_paper_keywords |
Auto-link keywords as [[wikilinks]] in notes |
Configuration
.lore.json
The .lore.json file configures knowledge paths and academic research settings. See .lore.example.json for a full example with comments.
Key sections:
knowledge_dir— Path to knowledge cards directoryindex_path— Path to BM25 search indexacademic.research_interests— Your research domains, keywords, and arXiv categoriesacademic.scoring— Paper scoring weights and dimensions
Environment Variables
Copy .env.example to .env and configure:
| Variable | Required | Description |
|---|---|---|
LORE_ACADEMIC |
No | Set to 1 to enable academic tools |
S2_API_KEY |
No | Semantic Scholar API key (get one free) |
LLM_API_KEY |
No | LLM API key for advanced synthesis pipeline |
SEARXNG_BASE_URL |
No | SearXNG URL for web research (default: http://localhost:8080) |
Project Structure
scholar-agent/
├── mcp_server.py # MCP server (13 tools)
├── setup_mcp.py # Embed into existing projects
├── pyproject.toml # Package configuration
├── docker-compose.yml # SearXNG
├── .lore.json # Project & academic configuration
├── schemas/ # Answer + evidence JSON schemas
├── scripts/
│ ├── academic/ # Academic research modules
│ │ ├── arxiv_search.py # arXiv + Semantic Scholar search
│ │ ├── conf_search.py # Conference paper search (DBLP)
│ │ ├── paper_analyzer.py # Deep-analysis note generation
│ │ ├── scoring.py # 4-dim paper scoring engine
│ │ ├── image_extractor.py # Figure extraction from PDFs
│ │ ├── note_linker.py # Wiki-link discovery + keyword linking
│ │ └── daily_workflow.py # Daily recommendation pipeline
│ ├── lore_config.py # Configuration reader
│ ├── local_index.py # BM25 index builder
│ ├── local_retrieve.py # Knowledge retrieval
│ ├── close_knowledge_loop.py # Knowledge card builder
│ └── ... # Research, synthesis, governance, graph
├── knowledge/ # Knowledge cards (gitignored, user-generated)
├── indexes/ # Generated indexes (gitignored)
└── tests/ # 247 tests
More Features
- Multi-perspective research — Parallel research from 5 perspectives (academic, technical, applied, contrarian, historical)
- Obsidian compatible — Standard Markdown + YAML frontmatter +
[[wiki-links]] - Knowledge governance CLI — Validate frontmatter, detect orphaned cards, find duplicates, manage lifecycle
- Provider fault tolerance — Each search source fails independently; falls back to local retrieval when offline
Testing
python -m pytest tests/ -v
247 tests, ~13s. No external services needed.
License
MIT — see LICENSE.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi