steuermann-ai
Health Uyari
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 8 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Steuermann is designed as a domain-agnostic, on-premise agentic AI framework (harness). It can be extended with profiles to adapt to specific use cases.
Steuermann - A Universal Agentic AI Orchestration Platform
Steuermann (german for steersman) is a self-hosted framework for building agentic AI systems that run entirely on-premise. It orchestrates multi-agent workflows through LangGraph, retrieves domain knowledge via RAG, remembers context across conversations, and presents everything through a modern React frontend — all without sending a single byte to external cloud AI services.
It is designed as a reusable template: one codebase, many deployments. Each deployment adapts to a specific domain (medical, financial, operational, analytical) through declarative profile overlays — configuration files that customize behavior without touching a line of framework code.
Experimental Beta — Steuermann is under active development. Core architecture, APIs, configuration schemas, and feature surface may change between releases. This is not yet production ready and many additional features will be implemented in the coming releases!
Why Steuermann?
Most agentic AI frameworks require cloud-hosted LLMs, lack proper UI integration, or force you into a single-domain mold. Steuermann takes a different approach:
- Your infrastructure, your data. Runs entirely on-premise with profile-owned provider configuration for LM Studio, Ollama, and other OpenAI-compatible endpoints. No data leaves your network unless you explicitly configure an external provider.
- Not just a library — a complete application. Ships with a production frontend (chat, settings, metrics, analytics), a FastAPI backend, and Docker Compose orchestration.
docker compose upgives you a working AI assistant. - Domain-agnostic by design. The same framework powers a medical knowledge assistant, a financial analysis tool, or an internal operations chatbot. Profiles customize everything through YAML — no forks, no rewrites.
- Multi-agent, not multi-hack. LangGraph owns the control flow. CrewAI crews (Research, Analytics, Code Generation, Planning) are invoked as graph nodes and return structured results. Clean separation, predictable behavior.
- Memory that matters. Semantic memory with importance scoring, co-occurrence linking, and knowledge graphs — not just a vector store dump. The system learns which memories are relevant and surfaces them intelligently.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Next.js Frontend (Port 3000) │
│ Chat · Settings · Metrics Dashboard · Analytics │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ FastAPI Adapter (Port 8001) │
│ Auth · Settings API · Metrics Proxy · Chat Streaming │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ LangGraph Orchestrator (Port 8000, internal) │
│ Graph Execution · Tool Routing · Memory · RAG · Crews │
└────┬─────────┬───────────┬───────────┬──────────────────────┘
│ │ │ │
Qdrant PostgreSQL Redis LM Studio (host)
Vectors Checkpoints Cache Local LLMs
| Service | Port | Role |
|---|---|---|
| Next.js | 3000 | React frontend — chat, settings, metrics, analytics |
| FastAPI | 8001 | Backend adapter — auth, settings, metrics proxy, streaming chat |
| LangGraph | 8000 (internal) | Orchestration engine — graph execution, tool routing, memory |
| PostgreSQL | 5432 (internal) | Conversations, user data, checkpoints, audit logs |
| Qdrant | 6333 (internal) | Vector database — RAG embeddings and Mem0 internal storage |
| Redis | 6379 (internal) | Response caching, message broker |
| Prometheus | 9090 (internal) | Metrics collection and alerting |
| LM Studio | 1234 (host) | Local LLM server — runs on host for GPU access |
By default only the frontend (3000) and FastAPI (8001) are bound to the host. Internal services (Qdrant, Prometheus, PostgreSQL, Redis) are accessible only within the Docker network. See step 3 below to expose them during development.
Features
Multi-Agent Orchestration
LangGraph is the single source of truth for control flow. It decides what happens and when. CrewAI provides the collaborative reasoning, but only as workers invoked from graph nodes — they never make routing decisions.
- 4 specialized crews ship out of the box:
- Research Crew — web research, RAG retrieval, and synthesis across multiple agents (Searcher → Analyst → Writer)
- Analytics Crew — data analysis with pattern recognition, statistical validation, and business insight generation
- Code Generation Crew — software development pipeline with Architect, Developer, and QA Engineer agents
- Planning Crew — project planning with requirements analysis, task decomposition, and risk assessment
- Configurable process types: sequential, hierarchical, or consensus-based agent collaboration
- Crew result caching with configurable TTL to avoid redundant multi-agent executions
- Timeout and retry logic with structured error handling per crew
CrewAI crews are currently disabled by default via
multi_agent_crews: falseinconfig/features.yaml. They are fully implemented and tested but pending fine-tuning for production workloads.
Advanced Memory System
Memory is not an afterthought — it is an explicit, first-class part of the execution graph. Memory nodes load relevant context before the LLM responds and persist new memories after.
- Mem0 OSS embedded backend with Qdrant-backed vector storage — no external memory service required
- Semantic search via Mem0's retrieval pipeline with configurable similarity thresholds
- Importance scoring — multi-factor ranking based on relevance, recency (exponential decay), access frequency (logarithmic), and explicit user feedback
- User rating feedback loop — memories rated after retrieval are tracked via Prometheus counters; feedback coverage visible in the Metrics Trends dashboard
- Co-occurrence linking — automatically builds a knowledge graph by tracking which memories are retrieved together, enabling context expansion and related-memory discovery
- Memory summarization — compresses and synthesizes older memories to maintain quality without unbounded growth
- Explicit lifecycle — memory load and update operations are dedicated graph nodes, not hidden side effects
- Configurable extraction mode —
memory.mem0.infer_enabledcan switch between full Mem0 extraction (true) and verbatim persistence fallback (false) - Auxiliary-role extraction binding — memory extraction uses the
llm.roles.auxiliarymodel path, so extraction capacity is controlled via auxiliary role model/context settings - Full CRUD API —
/api/memoriesendpoints with list, detail, delete, stats, and rate;/memoriesfrontend page for user-facing memory management
RAG & Knowledge Ingestion
A containerized ingestion service watches your document directories and automatically indexes everything into Qdrant for retrieval-augmented generation.
- Supported formats: PDF, DOCX, Markdown (.md, .markdown), Plain Text (.txt)
- Recursive directory discovery — drop files into any subfolder and they are found automatically
- Watch mode with three layers of reliability:
- Watchdog filesystem monitoring for instant detection
- Periodic fallback sweep every 30 seconds for missed events
- Initial startup sweep to ingest all existing documents
- Incremental processing — file hashing skips unchanged documents; changed files have old chunks replaced automatically
- Language detection — every chunk is tagged with
detected_language,language_confidence, andtarget_languagemetadata. All languages are accepted - Auto-deletion — removing a source file automatically purges its chunks from Qdrant
- Configurable chunking with overlap, batch embedding, and concurrent file processing
- Phase timing metrics — per-file performance breakdown (parse, chunk, embed, upsert)
Extensible Tool System
Tools are discovered dynamically from YAML manifests and can be LangChain-native Python classes or remote MCP (Model Context Protocol) servers.
- Built-in tools:
- DateTime — current time, timezone conversions
- Calculator — math operations, unit conversions, statistics
- File Operations — sandboxed file read/write/list with permission controls
- Workspace File Operations — workspace-specific file handling with intent detection
- Web Search — DuckDuckGo search via MCP server
- Webpage Extraction — fetch and parse web page content via MCP
- MCP server integration — connect any MCP-compatible tool server using the official SDK with streamable HTTP transport
- Semantic tool routing — queries are scored against tool descriptions using similarity matching with intent detection, so the right tools are selected without brittle keyword rules
- Three tool-calling modes:
native(model function calling),structured(JSON schema in prompt), andreact(Thought → Action → Observation loop) - Automatic mode downgrade — detects if a model doesn't support native tool calling and automatically falls back to structured mode without breaking the conversation flow
- Multi-provider LLM support — role-based provider chains with automatic fallback and LiteLLM router policy, owned by
config/profiles/<profile_id>/core.yaml. LLM capability probing detects tool-calling support at startup and on model changes - Tool-calling mode enforcement — mode validation ensures consistency across all tool routing layers (prefilter, routing decision, and Layer 2 invocation nodes)
- Tool sandboxing with permission-based access control
- Per-tool rate limiting with sliding window enforcement
- Profile-level tool configuration — enable, disable, or reconfigure tools per deployment profile
Multilingual Support
The framework is language-aware at every layer — from LLM model selection to prompt engineering to embedding generation.
- Per-language LLM models — configure different models for different languages (e.g.,
llama-3.1-8bfor English,llama-3.1-8b-germanfor German) - Multilingual embeddings — default model (
text-embedding-granite-278m-multilingual, 768 dimensions) supports all major languages - Per-language prompt files — system prompts, synthesis templates, and language enforcement rules stored in
config/prompts/{lang}.yamlwith profile-level overrides - Frontend i18n — the UI adapts to the selected language via structured message bundles
- Language auto-detection during ingestion with confidence scoring and metadata tagging
- Region-aware search — automatic country/region inference for localized web search queries
Modern Frontend
A production-ready Next.js application — not a demo chat widget — with real settings management, analytics, and operational dashboards.
- Chat interface with streaming responses, Markdown rendering, source footnotes, and conversation history
- Settings panel — model selection, language preferences, tool toggles, RAG configuration
- Metrics dashboard at
/metricswith two views:- Real-Time — requests, tokens, latency, active sessions, attachment stats, LLM call breakdown, live memory metrics panel (auto-refreshes every 10 seconds)
- Trends — usage over time, token consumption, latency analysis, memory trends, retrieval feedback loop panel, summary cards, CSV export
- Memory management at
/memories— browse, rate, and delete individual memories with full filtering and sorting - Workspace sidebar for managing uploaded documents per conversation
- Dark/light theme with system preference detection
- Profile-aware branding — colors, logos, and labels adapt to the active deployment profile
- Optional authentication — session-based login via
AUTH_ENABLED=truewith scrypt password hashing
Performance & Reliability
- Multi-layer caching: LLM response cache, memory query cache, and crew result cache backed by Redis (with graceful fallback to in-memory)
- 4 eviction policies: LRU, LFU, FIFO, TTL — selectable per cache layer
- Cache warming — preload common queries on startup for immediate responsiveness
- gzip compression for cached large responses with automatic stats tracking
- Conversation summarization — when conversations grow beyond a configurable length, older messages are automatically summarized to reduce token usage (40–60% token reduction)
- Circuit breakers — prevent cascading failures when downstream services are unhealthy
- Rate limiting — per-user request throttling via SlowAPI with configurable limits
- Qdrant snapshots — vector database backup and restore support
Monitoring & Observability
- Prometheus metrics for every layer: graph requests, node duration, token usage per model, LLM calls, memory operations, cache hit rates, attachment pipeline stats
- Frontend metrics dashboard — queries Prometheus via FastAPI, no direct Prometheus UI access needed
- Structured JSON logging via structlog with contextual fields (user_id, session_id, node, profile)
- Custom alert rules — Prometheus alerting configuration in
monitoring/alerts.yml - Performance regression tests — benchmarks for cache throughput and latency
Profile-Based Deployment
One codebase serves unlimited deployment scenarios. Each domain gets a profile — a set of YAML overlays that customize behavior without code changes.
- Profile structure:
config/profiles/<profile_id>/withprofile.yaml,core.yaml,features.yaml,agents.yaml,tools.yaml,ui.yaml - What profiles control: LLM provider and model selection, prompt templates, enabled tools, crew configuration, UI branding/theming, feature flags, knowledge collection references
- Activation: set
PROFILE_ID=medical-ai-dein.envand restart - Plugin system: profile-specific tools in
plugins/are auto-discovered via the tool registry - Starter profile ships as a working template to copy and customize
Security Model
Steuermann is designed for internal, trusted deployments behind your network perimeter.
- Optional session-based auth with scrypt password hashing and configurable session secrets
- Bearer token boundary between Next.js and FastAPI (
CHAT_ACCESS_TOKEN) - Network isolation by default — Qdrant, Prometheus, PostgreSQL, and Redis are only accessible within the Docker network; only the frontend and API are exposed to the host
- Per-user rate limiting enabled by default
- Tool sandboxing with permission-based access control
- No data exfiltration — local LLMs, local vector store, local database
- CORS configuration with allowlisted origins
Public-facing, multi-tenant, and zero-trust deployments are out of scope for this release.
Quick Start
Prerequisites
- Docker & Docker Compose
- LM Studio running on the host machine, or another provider endpoint configured for the active profile
1. Clone and configure
git clone https://github.com/mediaworkbench/steuermann-ai.git
cd steuermann-ai
cp .env.example .env
# Edit .env — at minimum set POSTGRES_PASSWORD, PROFILE_ID, and the provider endpoint vars used by that profile
poetry install
poetry run steuermann setup doctor --format json
poetry run steuermann config validate --format json
poetry run steuermann config contract-check --format json
poetry run steuermann docs check --format json
Linux hosts (bind-mount permissions): set UID/GID in .env before first build so container users match your host account.
echo "APP_UID=$(id -u)" >> .env
echo "APP_GID=$(id -g)" >> .env
mkdir -p ./data/workspaces ./data/checkpoints ./data/rag-data
chown -R $(id -u):$(id -g) ./data/workspaces ./data/checkpoints ./data/rag-data
2. (Optional) Expose internal services for local development
By default, Qdrant (6333), Prometheus (9090), PostgreSQL (5432), and Redis (6379) are only reachable within the Docker network. To access them from your host — for example to use the Qdrant dashboard or query Prometheus directly — copy the override template:
cp docker-compose.override.yml.example docker-compose.override.yml
docker compose up -d
Docker Compose automatically merges docker-compose.override.yml, exposing the dev ports. Delete the file to return to production-safe network isolation.
You can also use the same override file to remap ports if the defaults conflict with other local services:
# docker-compose.override.yml
services:
nextjs:
ports:
- "9000:3000" # remap Next.js to 9000
fastapi:
ports:
- "9001:8001" # remap FastAPI to 9001
If you remap ports, update the corresponding .env variables:
NEXT_PUBLIC_API_BASEif FastAPI moves to a non-standard port.QDRANT_PORTif Qdrant is remapped.POSTGRES_PORTif PostgreSQL is remapped.
Raspberry Pi 5 / ARM64: if Qdrant fails with Unsupported system page size, add this to your override:
services:
qdrant:
image: haktansuren/qdrant-pi5-fixed-jemalloc
3. Start the stack
docker compose up -d
| URL | What you get |
|---|---|
| http://localhost:3000 | Chat, settings, metrics dashboard |
| http://localhost:8001 | FastAPI API |
| http://localhost:6333/dashboard | Qdrant dashboard (requires docker-compose.override.yml) |
| http://localhost:9090 | Prometheus (requires docker-compose.override.yml) |
Under the hood, docker compose up -d also starts PostgreSQL (5432), Redis (6379), DuckDuckGo MCP (internal), and the Ingestion worker (watches RAG_DATA_PATH for documents). LangGraph (8000), Qdrant (6333), Prometheus (9090), PostgreSQL (5432), and Redis (6379) are internal-only by default and not exposed to the host unless you use the override file (see step 3).
4. (Optional) Enable authentication
Set these in .env:
AUTH_ENABLED=true
AUTH_USERNAME=admin
AUTH_PASSWORD_HASH='scrypt$...$...' # see .env.example for generation command
AUTH_SESSION_SECRET=$(python -c "import secrets; print(secrets.token_hex(32))")
CHAT_ACCESS_TOKEN=$(python -c "import secrets; print(secrets.token_hex(32))")
Rebuild and visit http://localhost:3000/login:
docker compose up -d --build
5. (Optional) Ingest domain knowledge
# Prepare a document directory (adjust RAG_DATA_PATH in .env)
mkdir -p ./data/rag-data
# Drop PDF, DOCX, Markdown, or TXT files into the directory
cp /path/to/your/docs/*.pdf ./data/rag-data/
# The ingestion container watches the directory and indexes automatically
docker compose up -d ingestion
Watch mode detects new files in real time, performs periodic sweeps every 30 seconds, and removes chunks when source files are deleted. Configure ingestion language, thresholds, batching, and collection settings in config/profiles/<profile_id>/core.yaml; keep .env for mount paths and service wiring only.
For host-side development and local test execution with Poetry, see docs/index.md.
6. (Optional) Scaffold a new profile
poetry run steuermann profile scaffold --from starter --profile my-profile
poetry run steuermann config validate --profile my-profile --format json
Configuration
Configuration follows a three-layer hierarchy: Base → Profile Overlay → Environment Variables.
config/
├── core.yaml # LLM providers, embeddings, token budgets, RAG
├── agents.yaml # CrewAI crew and agent definitions
├── tools.yaml # Tool manifests and routing
├── features.yaml # Feature flags (caching, crews, etc.)
├── prompts/
│ ├── en.yaml # English system prompts
│ └── de.yaml # German system prompts
└── profiles/
└── starter/ # Default profile (copy to create your own)
├── profile.yaml # Profile metadata
├── core.yaml # LLM/memory/RAG overrides
├── features.yaml # Feature flag overrides
├── agents.yaml # Crew overrides
├── tools.yaml # Tool overrides
└── ui.yaml # Branding and theme
Activate a profile by setting PROFILE_ID in .env. See docs/configuration.md for the full schema reference.
Operations CLI
The steuermann CLI ships as the single operational interface for validation, diagnostics, scaffolding, and documentation conformance. It is available after poetry install:
poetry run steuermann --help
| Command | Purpose |
|---|---|
steuermann profile active |
Show active profile id, directory, and metadata validity |
steuermann profile scaffold --from starter --profile <id> |
Create a new profile overlay from a template |
steuermann profile bundle export --profile <id> --out <file> |
Package a profile into a portable .tar.gz bundle |
steuermann profile bundle import --bundle <file> --profile <id> |
Import and validate a profile bundle |
steuermann config show [--section <s>] |
Render the fully merged effective configuration |
steuermann config explain --key <dot.path> |
Trace the source of a specific config key (base / overlay / env) |
steuermann config validate [--strict] |
Validate schema, required files, and env substitutions |
steuermann config set --profile <id> --key <k> --value <v> |
Dry-run set of a profile-safe key (add --apply --confirm APPLY to persist) |
steuermann config unset --profile <id> --key <k> |
Dry-run removal of a profile-safe key override |
steuermann config contract-check |
Verify CLI contract parity with the runtime loader |
steuermann setup doctor [--probe-endpoints] |
Preflight checks: env vars, endpoints, profile alignment |
steuermann docs check [--strict] |
Documentation conformance drift report (read-only) |
steuermann ingest ingest --source <dir> --collection <name> |
Index documents into Qdrant |
steuermann ingest watch --source <dir> --collection <name> |
Live watch mode with automatic re-indexing |
steuermann ingest validate --source <dir> |
Parse-only validation without writing to Qdrant |
steuermann ingest reindex --source <dir> --collection <name> |
Clear and rebuild a collection |
Every command supports --format json for CI-compatible machine-readable output with deterministic exit codes.
See docs/cli.md for the full CLI reference including argument details, guardrail behaviour, and workflow examples.
Documentation
Start here: docs/index.md — navigation hub for all documentation.
| Document | Description |
|---|---|
| docs/index.md | Complete documentation index |
| docs/cli.md | Operations CLI reference (steuermann) |
| docs/technical_architecture.md | System design, service boundaries, data flow |
| docs/configuration.md | Configuration schema and reference |
| docs/profile_creation.md | Step-by-step guide to creating domain profiles |
| docs/tool_development_guide.md | Building custom tools and MCP integrations |
| docs/crewai_extension_guide.md | Adding new CrewAI crews |
| docs/ingestion.md | RAG ingestion pipeline reference |
| docs/monitoring.md | Prometheus metrics and dashboard guide |
| docs/performance_optimization.md | Caching, load testing, benchmarking |
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Orchestration | LangGraph ≥1.1 | Workflow engine, state management, checkpointing |
| Multi-Agent | CrewAI ≥1.14 | Role-based collaborative agent reasoning |
| LLM Integration | LangChain ≥1.2 + langchain-litellm 0.6 | Unified LLM interface via LiteLLM router |
| Backend | FastAPI ≥0.135 | REST API, auth, metrics proxy |
| Frontend | Next.js ≥16 + React 19 | Production UI with App Router |
| Vector Store | Qdrant ≥1.7 | RAG embeddings and Mem0 internal vector store |
| Database | PostgreSQL ≥15 | Conversations, checkpoints, users, audit logs |
| Memory | Mem0 OSS ≥2.0 | Memory abstraction with embedded Qdrant backend |
| Embeddings | Multilingual models (local, LM Studio) | OpenAI-compatible API |
| Cache | Redis ≥5 | Response caching, session data |
| Monitoring | Prometheus | Metrics collection and alerting |
| LLM Server | LM Studio | Local model hosting with GPU acceleration |
| Tools | MCP SDK | Model Context Protocol integration |
Status
Steuermann is in experimental beta. The core orchestration, memory, RAG pipeline, frontend, and monitoring stack are stable and tested. The following areas are actively evolving:
- CrewAI crew fine-tuning for production workloads (currently disabled by default)
- Additional document parsers and ingestion formats
- Extended MCP tool ecosystem
- Multi-user workspace features
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi