estonian-mcp
Health Warn
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Fail
- Hardcoded secret — Potential hardcoded credential in tests/test_http.py
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Get better Estonian writing results from your AI agent. MCP server wrapping EstNLTK — drop-in for Claude Cowork, claude.ai, Claude Desktop, Cursor. Free hosted endpoint or self-host. 10 tools: spell-check, morphology, lemmas, NER, syllabification, WordNet synonyms, fastText related-words, register classifier.
estonian-mcp
Claude is quite bad at Estonian, so this MCP is here to fix that. Give it a shot.
A small Model Context Protocol server that exposes
EstNLTK — the Estonian NLP toolkit —
as tools any LLM client can call in real time. Hand it Estonian text,
get back correct lemmas, morphology, POS tags, spell-check + suggestions,
syllables, named entities, WordNet synonyms, fastText-based related
words, and a register hint (formal vs colloquial).
If your AI agent has to draft, edit, or proofread Estonian, this wires
in ground truth so it stops guessing on the mechanical layer
(spelling, case forms, conjugation) and gives it real Estonian
synonyms instead of inventing them.
Three ways to use it:
- 👉 Paste a URL into your Claude app — the easiest path, no
terminal, no install. See Get started in 30 seconds below. - One-click on Smithery — install from the
estonian-mcp listing. - Self-host — clone, run locally as stdio, or deploy your own
container to Fly.io / any host. See Self-host (advanced).
What it does
| Tool | What it does |
|---|---|
tokenize(text) |
Split text into sentences and words |
analyze_morphology(text) |
Lemma, POS, form, root, ending, clitic, compound parts per word |
lemmatize(text) |
Just the dictionary form per word |
pos_tag(text) |
Just the part-of-speech tag per word |
spell_check(text) |
Spelling check + correction suggestions |
syllabify(word) |
Syllables with quantity + accent |
named_entities(text) |
People / places / organisations |
synonyms(word) |
Synsets from Estonian WordNet — synonymous lemmas + definition + examples per word sense |
find_related_words(word) |
Top-N semantically nearby words via fastText embeddings (semantically related, not always synonymous) |
classify_register(text) |
Coarse formal/colloquial register hint with matched markers (heuristic, phase 1) |
POS tag set: S=noun, V=verb, A=adj, P=pron, D=adv, K=adp,J=conj, N=numeral, I=interj, Y=abbrev, X=foreign, Z=punct.
✨ Get started in 30 seconds (no install)
This section is for everyone — including if you've never opened a
terminal in your life. You'll be done before your tea is steeped.
The trick is that we run the server for you on the public internet athttps://estonian-mcp.fly.dev/mcp. You just need to tell your Claude
app to talk to it. Pick the app you use:
In Claude Cowork
- Open Cowork and click your profile / Settings.
- Find Connectors in the sidebar.
- Click Add custom connector.
- Paste this URL into the URL field:
https://estonian-mcp.fly.dev/mcp - Leave any "Authentication" / "API key" / "Bearer token" fields
empty. The server is public — no token needed. - Click Save / Connect.
- Done. Start a new chat and write in Estonian — proofread an
email, study a paragraph, draft a reply. Claude will reach for
the EstNLTK tools whenever it needs to verify spelling, lemmas,
or morphology rather than guessing.
In claude.ai (web Claude)
- Click your profile in the bottom-left → Settings.
- Find Connectors (sometimes called Custom Integrations).
- Click Add custom connector.
- Paste:
https://estonian-mcp.fly.dev/mcp - Authentication: none (leave fields blank).
- Save. The new tools appear in your tool tray.
In Claude Desktop
If your Claude Desktop has a Settings → Connectors menu (newer
versions), follow the same three steps as Cowork above.
If it doesn't, you have an older Desktop that needs a JSON config
file edit — see Self-host (advanced) for the
local-stdio path, which works on every version.
Don't see your client here?
Any tool that supports MCP over HTTPS can connect — just point it athttps://estonian-mcp.fly.dev/mcp with no auth. If your client only
speaks stdio (Cursor, VS Code MCP, Continue, Zed, Claude Code), jump
to the local-install path in Self-host.
💡 Pro tip — teach Claude your Estonian alongside the MCP
This MCP gives Claude correct linguistics: real lemmas, real case
forms, real spelling. What it can't do is teach Claude your voice —
the register, idioms, and tone you actually want when writing.
You handle the voice; the MCP handles the correctness. Layer them.
A few things to add to your Claude project / custom instructions /
system prompt to get this right:
- Set the register. "Always reply in formal officialese Estonian
for legal and government topics, and in conversational Tallinn
speech for chat replies. Never mix the two in one message." - Pin the dialect / region. "I'm from Tartu — prefer southern
Estonian phrasings where there's a choice (e.g. 'kus sa lähed'
rather than 'kuhu sa lähed' for casual speech)." - Show your tone with examples. Paste 3–4 short paragraphs of
your own writing into the project instructions and ask Claude to
match that voice. Real examples beat any abstract description. - Anchor common mistakes. "You always confuse
kasutama(to use)
withkäsitlema(to handle / to deal with). Double-check those
with the lemmatize tool before sending." - Direct the MCP explicitly when it matters. "Before sending any
Estonian email, run spell_check on every word. Show me misspelled
words with suggestions before drafting." - Use
classify_registeras a sanity check. "After drafting,
run classify_register on the final text and warn me if it lands
in 'formal' or 'colloquial' when I asked for the opposite." The
classifier is coarse but reliably catches drift into officialese
(käesolev,vastavalt,sätestama) or slang (mõnus,vinge,kuule). - Use
synonymsto break repetition. "This newsletter useskasutamafour times. Look up synonyms via the MCP and suggest
natural-sounding swaps." You'll get real Estonian alternatives
with definitions, not invented ones. - Use
find_related_wordsfor richer rewrites. "What words
pattern withkohvin Estonian? Use that to suggest three
alternative phrasings for our café-launch ad copy." This is
fastText-based, so it surfaces near-neighbours that aren't strict
synonyms — useful when you want adjacent concepts, not just
same-meaning swaps. (Quick rule of thumb:synonymsfor "say the
same thing differently";find_related_wordsfor "what else
belongs in this conceptual space.")
The MCP catches misspelled words and invented case forms; your
prompt drives the style. Together they make Claude actually useful
for writing in Estonian, not just plausible-looking.
How to prompt it once it's connected
Most prompts don't need to mention the tools by name — Claude picks
the right one. A few patterns that work especially well:
Proofread this Estonian email and use spell_check on any words
you're unsure about: <text>
Lemmatize this Estonian paragraph, then translate the lemmas to
English so I can study vocabulary: <text>
Analyze the morphology of this sentence and explain the case
markings: "Tallinnas elavad eestlased räägivad eesti keelt."
Extract the people and places from this Estonian news article,
then summarise in one paragraph.
This Estonian draft uses "kasutama" three times — look up synonyms
via the MCP and rewrite each occurrence with a natural-sounding
alternative that preserves the meaning.
Classify the register of this draft. If it scores formal, soften
it for a casual newsletter audience. If it scores colloquial,
tighten it for a B2B email.
The model calls the tool, gets authoritative output, and bases its
response on that — no more hallucinated lemmas or invented case forms.
All clients at a glance
| Client | No-install path | Local-install path |
|---|---|---|
| Claude Cowork | ✅ Paste URL | ✅ stdio via JSON |
| Claude Desktop | ✅ Paste URL (newer) | ✅ stdio via JSON |
| claude.ai web | ✅ Paste URL | — |
| Claude Code (CLI) | — | ✅ claude mcp add ... |
| Cursor | — | ✅ stdio via JSON |
| VS Code MCP / Continue / Zed | — | ✅ stdio via JSON |
"No-install path" = paste https://estonian-mcp.fly.dev/mcp in the
client's Connectors UI. "Local-install path" = clone the repo and
point the client at python server.py.
Self-host (advanced)
The hosted instance is convenient, but if you'd rather run your own
(privacy, latency, custom auth, offline use), the same one-file
server works locally and as a container.
Run locally as stdio (zero network)
EstNLTK requires Python 3.10–3.13.
git clone https://github.com/silly-geese/estonian-mcp.git
cd estonian-mcp
uv sync
uv run python tests/test_smoke.py # verify
Then wire it into your client.
Claude Code:
claude mcp add estnltk -- /absolute/path/to/uv \
--directory /absolute/path/to/estonian-mcp \
run python server.py
Claude Desktop / Cowork (local mode) — edit~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"estnltk": {
"command": "/absolute/path/to/uv",
"args": [
"--directory", "/absolute/path/to/estonian-mcp",
"run", "python", "server.py"
]
}
}
}
Cursor — same JSON shape in ~/.cursor/mcp.json.
Run as a remote server (HTTP)
The same server.py speaks streamable-http over the network.
Two auth postures:
- Public mode (
ESTNLTK_MCP_PUBLIC_MODE=1) — no bearer token,
per-IP rate limit (default 120/min). This is how the silly-geese
hosted instance runs. - Bearer mode (default) — every request must carry
Authorization: Bearer <token>(or Smithery's?config=<base64>);
per-token rate limit. Refuses to start withoutESTNLTK_MCP_AUTH_TOKEN≥16 chars.
Fly.io public deployment (matches silly-geese):
fly auth login
fly apps create my-estonian-mcp
fly deploy
fly.toml already sets ESTNLTK_MCP_PUBLIC_MODE=1. Endpoint:https://my-estonian-mcp.fly.dev/mcp.
Fly.io with bearer auth — removeESTNLTK_MCP_PUBLIC_MODE from fly.toml's [env] block, then:
fly secrets set ESTNLTK_MCP_AUTH_TOKEN="$(python3 -c 'import secrets;print(secrets.token_urlsafe(32))')"
fly deploy
Generic Docker (any container host):
# Public
docker run -p 8081:8081 -e ESTNLTK_MCP_PUBLIC_MODE=1 \
ghcr.io/silly-geese/estonian-mcp # or build from source
# Bearer
docker run -p 8081:8081 \
-e ESTNLTK_MCP_AUTH_TOKEN="$(python3 -c 'import secrets;print(secrets.token_urlsafe(32))')" \
ghcr.io/silly-geese/estonian-mcp
Smithery auto-builds from smithery.yaml and hosts the image
for you. Fork, connect on Smithery,
deploy. The shipped configSchema is empty (one-click install)
because the deployment runs in public mode; flip it back if you fork
to a bearer-mode setup.
Security
- stdio mode: pure local subprocess. No network egress, no shell
exec, no fs writes, no telemetry. - HTTP / public mode: no auth required (intentional for the free
public service). Per-IP rate limit (120/min default). Same hardening
as bearer mode: no shell exec, no fs writes, no telemetry,
size-bounded inputs. - HTTP / bearer mode:
ESTNLTK_MCP_AUTH_TOKEN(≥16 chars)
required, server refuses to start without it. Bearer auth on every
request, constant-time comparison, per-token rate limit (60/min). - Common to all HTTP:
/healthis the only unauthenticated path.
No request or token logging.proxy_headers=Trueso client IPs
reflect the originator, not the platform's edge. - Inputs: 100 KB cap per text tool, 200 chars for
syllabify.
Oversized inputs return a structured error rather than hanging. - Supply chain: deps pinned + hashed in
uv.lock. Dependabot
watches pip + GitHub Actions weekly. CI runs smoke + HTTP tests +
Docker build/boot on Python 3.11 and 3.13 on every push.
Full threat model and disclosure path: SECURITY.md.
Privacy policy (what we receive, what we don't store): PRIVACY.md.
Terms of service for the hosted endpoint: TERMS.md.
Notes
- Most EstNLTK models (morph, NER, spell-check) ship inside the
wheel — no runtime downloads. - WordNet is a separate ~26 MB resource (used by
synonyms); the
Docker image pre-downloads it at build time so the first call
doesn't pause to fetch it. - The fastText model used by
find_related_wordsis a separate ~22
MB compressed resource (CC-BY-SA-3.0; see NOTICE for
attribution); also pre-downloaded at image-build time. - Heavy neural taggers (
estnltk_neural, BERT-based NER) are
intentionally not pulled in; this server stays lean and fast. - First call after server start incurs a one-time tag-layer load
(~1–2 s). Subsequent calls are millisecond-scale. - The hosted Fly instance scales to zero when idle; the first request
after a quiet period takes ~5 s, then everything is fast again.
License
Apache-2.0 for the source. EstNLTK is dual-licensed
GPL-2.0 OR Apache-2.0 (we use the Apache-2.0 option). The bundled
Vabamorf analyzer is LGPL-2.1 with a separate commercial-use license.
The bundled Estonian fastText model file (used by find_related_words)
is CC-BY-SA-3.0 — its share-alike obligation applies to the model
file only, not the rest of the project. See NOTICE for full
attribution and obligations when redistributing.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found