steuermann-ai

agent
Guvenlik Denetimi
Uyari
Health Uyari
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 8 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Steuermann is designed as a domain-agnostic, on-premise agentic AI framework (harness). It can be extended with profiles to adapt to specific use cases.

README.md

Steuermann - A Universal Agentic AI Orchestration Platform

Steuermann

Build domain-specific, multi-agent AI applications — entirely on your own infrastructure.

Python 3.11+
LangGraph
Next.js
Docker
Status

Steuermann (german for steersman) is a self-hosted framework for building agentic AI systems that run entirely on-premise. It orchestrates multi-agent workflows through LangGraph, retrieves domain knowledge via RAG, remembers context across conversations, and presents everything through a modern React frontend — all without sending a single byte to external cloud AI services.

It is designed as a reusable template: one codebase, many deployments. Each deployment adapts to a specific domain (medical, financial, operational, analytical) through declarative profile overlays — configuration files that customize behavior without touching a line of framework code.

Experimental Beta — Steuermann is under active development. Core architecture, APIs, configuration schemas, and feature surface may change between releases. This is not yet production ready and many additional features will be implemented in the coming releases!

Why Steuermann?

Most agentic AI frameworks require cloud-hosted LLMs, lack proper UI integration, or force you into a single-domain mold. Steuermann takes a different approach:

  • Your infrastructure, your data. Runs entirely on-premise with profile-owned provider configuration for LM Studio, Ollama, and other OpenAI-compatible endpoints. No data leaves your network unless you explicitly configure an external provider.
  • Not just a library — a complete application. Ships with a production frontend (chat, settings, metrics, analytics), a FastAPI backend, and Docker Compose orchestration. docker compose up gives you a working AI assistant.
  • Domain-agnostic by design. The same framework powers a medical knowledge assistant, a financial analysis tool, or an internal operations chatbot. Profiles customize everything through YAML — no forks, no rewrites.
  • Multi-agent, not multi-hack. LangGraph owns the control flow. CrewAI crews (Research, Analytics, Code Generation, Planning) are invoked as graph nodes and return structured results. Clean separation, predictable behavior.
  • Memory that matters. Semantic memory with importance scoring, co-occurrence linking, and knowledge graphs — not just a vector store dump. The system learns which memories are relevant and surfaces them intelligently.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Next.js Frontend (Port 3000)                               │
│  Chat · Settings · Metrics Dashboard · Analytics            │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│  FastAPI Adapter (Port 8001)                                │
│  Auth · Settings API · Metrics Proxy · Chat Streaming       │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│  LangGraph Orchestrator (Port 8000, internal)               │
│  Graph Execution · Tool Routing · Memory · RAG · Crews      │
└────┬─────────┬───────────┬───────────┬──────────────────────┘
     │         │           │           │
   Qdrant   PostgreSQL   Redis      LM Studio (host)
   Vectors  Checkpoints  Cache      Local LLMs
Service Port Role
Next.js 3000 React frontend — chat, settings, metrics, analytics
FastAPI 8001 Backend adapter — auth, settings, metrics proxy, streaming chat
LangGraph 8000 (internal) Orchestration engine — graph execution, tool routing, memory
PostgreSQL 5432 (internal) Conversations, user data, checkpoints, audit logs
Qdrant 6333 (internal) Vector database — RAG embeddings and Mem0 internal storage
Redis 6379 (internal) Response caching, message broker
Prometheus 9090 (internal) Metrics collection and alerting
LM Studio 1234 (host) Local LLM server — runs on host for GPU access

By default only the frontend (3000) and FastAPI (8001) are bound to the host. Internal services (Qdrant, Prometheus, PostgreSQL, Redis) are accessible only within the Docker network. See step 3 below to expose them during development.


Features

Multi-Agent Orchestration

LangGraph is the single source of truth for control flow. It decides what happens and when. CrewAI provides the collaborative reasoning, but only as workers invoked from graph nodes — they never make routing decisions.

  • 4 specialized crews ship out of the box:
    • Research Crew — web research, RAG retrieval, and synthesis across multiple agents (Searcher → Analyst → Writer)
    • Analytics Crew — data analysis with pattern recognition, statistical validation, and business insight generation
    • Code Generation Crew — software development pipeline with Architect, Developer, and QA Engineer agents
    • Planning Crew — project planning with requirements analysis, task decomposition, and risk assessment
  • Configurable process types: sequential, hierarchical, or consensus-based agent collaboration
  • Crew result caching with configurable TTL to avoid redundant multi-agent executions
  • Timeout and retry logic with structured error handling per crew

CrewAI crews are currently disabled by default via multi_agent_crews: false in config/features.yaml. They are fully implemented and tested but pending fine-tuning for production workloads.

Advanced Memory System

Memory is not an afterthought — it is an explicit, first-class part of the execution graph. Memory nodes load relevant context before the LLM responds and persist new memories after.

  • Mem0 OSS embedded backend with Qdrant-backed vector storage — no external memory service required
  • Semantic search via Mem0's retrieval pipeline with configurable similarity thresholds
  • Importance scoring — multi-factor ranking based on relevance, recency (exponential decay), access frequency (logarithmic), and explicit user feedback
  • User rating feedback loop — memories rated after retrieval are tracked via Prometheus counters; feedback coverage visible in the Metrics Trends dashboard
  • Co-occurrence linking — automatically builds a knowledge graph by tracking which memories are retrieved together, enabling context expansion and related-memory discovery
  • Memory summarization — compresses and synthesizes older memories to maintain quality without unbounded growth
  • Explicit lifecycle — memory load and update operations are dedicated graph nodes, not hidden side effects
  • Configurable extraction modememory.mem0.infer_enabled can switch between full Mem0 extraction (true) and verbatim persistence fallback (false)
  • Auxiliary-role extraction binding — memory extraction uses the llm.roles.auxiliary model path, so extraction capacity is controlled via auxiliary role model/context settings
  • Full CRUD API/api/memories endpoints with list, detail, delete, stats, and rate; /memories frontend page for user-facing memory management

RAG & Knowledge Ingestion

A containerized ingestion service watches your document directories and automatically indexes everything into Qdrant for retrieval-augmented generation.

  • Supported formats: PDF, DOCX, Markdown (.md, .markdown), Plain Text (.txt)
  • Recursive directory discovery — drop files into any subfolder and they are found automatically
  • Watch mode with three layers of reliability:
    • Watchdog filesystem monitoring for instant detection
    • Periodic fallback sweep every 30 seconds for missed events
    • Initial startup sweep to ingest all existing documents
  • Incremental processing — file hashing skips unchanged documents; changed files have old chunks replaced automatically
  • Language detection — every chunk is tagged with detected_language, language_confidence, and target_language metadata. All languages are accepted
  • Auto-deletion — removing a source file automatically purges its chunks from Qdrant
  • Configurable chunking with overlap, batch embedding, and concurrent file processing
  • Phase timing metrics — per-file performance breakdown (parse, chunk, embed, upsert)

Extensible Tool System

Tools are discovered dynamically from YAML manifests and can be LangChain-native Python classes or remote MCP (Model Context Protocol) servers.

  • Built-in tools:
    • DateTime — current time, timezone conversions
    • Calculator — math operations, unit conversions, statistics
    • File Operations — sandboxed file read/write/list with permission controls
    • Workspace File Operations — workspace-specific file handling with intent detection
    • Web Search — DuckDuckGo search via MCP server
    • Webpage Extraction — fetch and parse web page content via MCP
  • MCP server integration — connect any MCP-compatible tool server using the official SDK with streamable HTTP transport
  • Semantic tool routing — queries are scored against tool descriptions using similarity matching with intent detection, so the right tools are selected without brittle keyword rules
  • Three tool-calling modes: native (model function calling), structured (JSON schema in prompt), and react (Thought → Action → Observation loop)
  • Automatic mode downgrade — detects if a model doesn't support native tool calling and automatically falls back to structured mode without breaking the conversation flow
  • Multi-provider LLM support — role-based provider chains with automatic fallback and LiteLLM router policy, owned by config/profiles/<profile_id>/core.yaml. LLM capability probing detects tool-calling support at startup and on model changes
  • Tool-calling mode enforcement — mode validation ensures consistency across all tool routing layers (prefilter, routing decision, and Layer 2 invocation nodes)
  • Tool sandboxing with permission-based access control
  • Per-tool rate limiting with sliding window enforcement
  • Profile-level tool configuration — enable, disable, or reconfigure tools per deployment profile

Multilingual Support

The framework is language-aware at every layer — from LLM model selection to prompt engineering to embedding generation.

  • Per-language LLM models — configure different models for different languages (e.g., llama-3.1-8b for English, llama-3.1-8b-german for German)
  • Multilingual embeddings — default model (text-embedding-granite-278m-multilingual, 768 dimensions) supports all major languages
  • Per-language prompt files — system prompts, synthesis templates, and language enforcement rules stored in config/prompts/{lang}.yaml with profile-level overrides
  • Frontend i18n — the UI adapts to the selected language via structured message bundles
  • Language auto-detection during ingestion with confidence scoring and metadata tagging
  • Region-aware search — automatic country/region inference for localized web search queries

Modern Frontend

A production-ready Next.js application — not a demo chat widget — with real settings management, analytics, and operational dashboards.

  • Chat interface with streaming responses, Markdown rendering, source footnotes, and conversation history
  • Settings panel — model selection, language preferences, tool toggles, RAG configuration
  • Metrics dashboard at /metrics with two views:
    • Real-Time — requests, tokens, latency, active sessions, attachment stats, LLM call breakdown, live memory metrics panel (auto-refreshes every 10 seconds)
    • Trends — usage over time, token consumption, latency analysis, memory trends, retrieval feedback loop panel, summary cards, CSV export
  • Memory management at /memories — browse, rate, and delete individual memories with full filtering and sorting
  • Workspace sidebar for managing uploaded documents per conversation
  • Dark/light theme with system preference detection
  • Profile-aware branding — colors, logos, and labels adapt to the active deployment profile
  • Optional authentication — session-based login via AUTH_ENABLED=true with scrypt password hashing

Performance & Reliability

  • Multi-layer caching: LLM response cache, memory query cache, and crew result cache backed by Redis (with graceful fallback to in-memory)
  • 4 eviction policies: LRU, LFU, FIFO, TTL — selectable per cache layer
  • Cache warming — preload common queries on startup for immediate responsiveness
  • gzip compression for cached large responses with automatic stats tracking
  • Conversation summarization — when conversations grow beyond a configurable length, older messages are automatically summarized to reduce token usage (40–60% token reduction)
  • Circuit breakers — prevent cascading failures when downstream services are unhealthy
  • Rate limiting — per-user request throttling via SlowAPI with configurable limits
  • Qdrant snapshots — vector database backup and restore support

Monitoring & Observability

  • Prometheus metrics for every layer: graph requests, node duration, token usage per model, LLM calls, memory operations, cache hit rates, attachment pipeline stats
  • Frontend metrics dashboard — queries Prometheus via FastAPI, no direct Prometheus UI access needed
  • Structured JSON logging via structlog with contextual fields (user_id, session_id, node, profile)
  • Custom alert rules — Prometheus alerting configuration in monitoring/alerts.yml
  • Performance regression tests — benchmarks for cache throughput and latency

Profile-Based Deployment

One codebase serves unlimited deployment scenarios. Each domain gets a profile — a set of YAML overlays that customize behavior without code changes.

  • Profile structure: config/profiles/<profile_id>/ with profile.yaml, core.yaml, features.yaml, agents.yaml, tools.yaml, ui.yaml
  • What profiles control: LLM provider and model selection, prompt templates, enabled tools, crew configuration, UI branding/theming, feature flags, knowledge collection references
  • Activation: set PROFILE_ID=medical-ai-de in .env and restart
  • Plugin system: profile-specific tools in plugins/ are auto-discovered via the tool registry
  • Starter profile ships as a working template to copy and customize

Security Model

Steuermann is designed for internal, trusted deployments behind your network perimeter.

  • Optional session-based auth with scrypt password hashing and configurable session secrets
  • Bearer token boundary between Next.js and FastAPI (CHAT_ACCESS_TOKEN)
  • Network isolation by default — Qdrant, Prometheus, PostgreSQL, and Redis are only accessible within the Docker network; only the frontend and API are exposed to the host
  • Per-user rate limiting enabled by default
  • Tool sandboxing with permission-based access control
  • No data exfiltration — local LLMs, local vector store, local database
  • CORS configuration with allowlisted origins

Public-facing, multi-tenant, and zero-trust deployments are out of scope for this release.


Quick Start

Prerequisites

  • Docker & Docker Compose
  • LM Studio running on the host machine, or another provider endpoint configured for the active profile

1. Clone and configure

git clone https://github.com/mediaworkbench/steuermann-ai.git
cd steuermann-ai
cp .env.example .env
# Edit .env — at minimum set POSTGRES_PASSWORD, PROFILE_ID, and the provider endpoint vars used by that profile
poetry install
poetry run steuermann setup doctor --format json
poetry run steuermann config validate --format json
poetry run steuermann config contract-check --format json
poetry run steuermann docs check --format json

Linux hosts (bind-mount permissions): set UID/GID in .env before first build so container users match your host account.

echo "APP_UID=$(id -u)" >> .env
echo "APP_GID=$(id -g)" >> .env
mkdir -p ./data/workspaces ./data/checkpoints ./data/rag-data
chown -R $(id -u):$(id -g) ./data/workspaces ./data/checkpoints ./data/rag-data

2. (Optional) Expose internal services for local development

By default, Qdrant (6333), Prometheus (9090), PostgreSQL (5432), and Redis (6379) are only reachable within the Docker network. To access them from your host — for example to use the Qdrant dashboard or query Prometheus directly — copy the override template:

cp docker-compose.override.yml.example docker-compose.override.yml
docker compose up -d

Docker Compose automatically merges docker-compose.override.yml, exposing the dev ports. Delete the file to return to production-safe network isolation.

You can also use the same override file to remap ports if the defaults conflict with other local services:

# docker-compose.override.yml
services:
  nextjs:
    ports:
      - "9000:3000" # remap Next.js to 9000
  fastapi:
    ports:
      - "9001:8001" # remap FastAPI to 9001

If you remap ports, update the corresponding .env variables:

  • NEXT_PUBLIC_API_BASE if FastAPI moves to a non-standard port.
  • QDRANT_PORT if Qdrant is remapped.
  • POSTGRES_PORT if PostgreSQL is remapped.

Raspberry Pi 5 / ARM64: if Qdrant fails with Unsupported system page size, add this to your override:

services:
  qdrant:
    image: haktansuren/qdrant-pi5-fixed-jemalloc

3. Start the stack

docker compose up -d
URL What you get
http://localhost:3000 Chat, settings, metrics dashboard
http://localhost:8001 FastAPI API
http://localhost:6333/dashboard Qdrant dashboard (requires docker-compose.override.yml)
http://localhost:9090 Prometheus (requires docker-compose.override.yml)

Under the hood, docker compose up -d also starts PostgreSQL (5432), Redis (6379), DuckDuckGo MCP (internal), and the Ingestion worker (watches RAG_DATA_PATH for documents). LangGraph (8000), Qdrant (6333), Prometheus (9090), PostgreSQL (5432), and Redis (6379) are internal-only by default and not exposed to the host unless you use the override file (see step 3).

4. (Optional) Enable authentication

Set these in .env:

AUTH_ENABLED=true
AUTH_USERNAME=admin
AUTH_PASSWORD_HASH='scrypt$...$...'   # see .env.example for generation command
AUTH_SESSION_SECRET=$(python -c "import secrets; print(secrets.token_hex(32))")
CHAT_ACCESS_TOKEN=$(python -c "import secrets; print(secrets.token_hex(32))")

Rebuild and visit http://localhost:3000/login:

docker compose up -d --build

5. (Optional) Ingest domain knowledge

# Prepare a document directory (adjust RAG_DATA_PATH in .env)
mkdir -p ./data/rag-data

# Drop PDF, DOCX, Markdown, or TXT files into the directory
cp /path/to/your/docs/*.pdf ./data/rag-data/

# The ingestion container watches the directory and indexes automatically
docker compose up -d ingestion

Watch mode detects new files in real time, performs periodic sweeps every 30 seconds, and removes chunks when source files are deleted. Configure ingestion language, thresholds, batching, and collection settings in config/profiles/<profile_id>/core.yaml; keep .env for mount paths and service wiring only.

For host-side development and local test execution with Poetry, see docs/index.md.

6. (Optional) Scaffold a new profile

poetry run steuermann profile scaffold --from starter --profile my-profile
poetry run steuermann config validate --profile my-profile --format json

Configuration

Configuration follows a three-layer hierarchy: Base → Profile Overlay → Environment Variables.

config/
├── core.yaml              # LLM providers, embeddings, token budgets, RAG
├── agents.yaml            # CrewAI crew and agent definitions
├── tools.yaml             # Tool manifests and routing
├── features.yaml          # Feature flags (caching, crews, etc.)
├── prompts/
│   ├── en.yaml            # English system prompts
│   └── de.yaml            # German system prompts
└── profiles/
    └── starter/           # Default profile (copy to create your own)
        ├── profile.yaml   # Profile metadata
        ├── core.yaml      # LLM/memory/RAG overrides
        ├── features.yaml  # Feature flag overrides
        ├── agents.yaml    # Crew overrides
        ├── tools.yaml     # Tool overrides
        └── ui.yaml        # Branding and theme

Activate a profile by setting PROFILE_ID in .env. See docs/configuration.md for the full schema reference.


Operations CLI

The steuermann CLI ships as the single operational interface for validation, diagnostics, scaffolding, and documentation conformance. It is available after poetry install:

poetry run steuermann --help
Command Purpose
steuermann profile active Show active profile id, directory, and metadata validity
steuermann profile scaffold --from starter --profile <id> Create a new profile overlay from a template
steuermann profile bundle export --profile <id> --out <file> Package a profile into a portable .tar.gz bundle
steuermann profile bundle import --bundle <file> --profile <id> Import and validate a profile bundle
steuermann config show [--section <s>] Render the fully merged effective configuration
steuermann config explain --key <dot.path> Trace the source of a specific config key (base / overlay / env)
steuermann config validate [--strict] Validate schema, required files, and env substitutions
steuermann config set --profile <id> --key <k> --value <v> Dry-run set of a profile-safe key (add --apply --confirm APPLY to persist)
steuermann config unset --profile <id> --key <k> Dry-run removal of a profile-safe key override
steuermann config contract-check Verify CLI contract parity with the runtime loader
steuermann setup doctor [--probe-endpoints] Preflight checks: env vars, endpoints, profile alignment
steuermann docs check [--strict] Documentation conformance drift report (read-only)
steuermann ingest ingest --source <dir> --collection <name> Index documents into Qdrant
steuermann ingest watch --source <dir> --collection <name> Live watch mode with automatic re-indexing
steuermann ingest validate --source <dir> Parse-only validation without writing to Qdrant
steuermann ingest reindex --source <dir> --collection <name> Clear and rebuild a collection

Every command supports --format json for CI-compatible machine-readable output with deterministic exit codes.

See docs/cli.md for the full CLI reference including argument details, guardrail behaviour, and workflow examples.


Documentation

Start here: docs/index.md — navigation hub for all documentation.

Document Description
docs/index.md Complete documentation index
docs/cli.md Operations CLI reference (steuermann)
docs/technical_architecture.md System design, service boundaries, data flow
docs/configuration.md Configuration schema and reference
docs/profile_creation.md Step-by-step guide to creating domain profiles
docs/tool_development_guide.md Building custom tools and MCP integrations
docs/crewai_extension_guide.md Adding new CrewAI crews
docs/ingestion.md RAG ingestion pipeline reference
docs/monitoring.md Prometheus metrics and dashboard guide
docs/performance_optimization.md Caching, load testing, benchmarking

Technology Stack

Layer Technology Purpose
Orchestration LangGraph ≥1.1 Workflow engine, state management, checkpointing
Multi-Agent CrewAI ≥1.14 Role-based collaborative agent reasoning
LLM Integration LangChain ≥1.2 + langchain-litellm 0.6 Unified LLM interface via LiteLLM router
Backend FastAPI ≥0.135 REST API, auth, metrics proxy
Frontend Next.js ≥16 + React 19 Production UI with App Router
Vector Store Qdrant ≥1.7 RAG embeddings and Mem0 internal vector store
Database PostgreSQL ≥15 Conversations, checkpoints, users, audit logs
Memory Mem0 OSS ≥2.0 Memory abstraction with embedded Qdrant backend
Embeddings Multilingual models (local, LM Studio) OpenAI-compatible API
Cache Redis ≥5 Response caching, session data
Monitoring Prometheus Metrics collection and alerting
LLM Server LM Studio Local model hosting with GPU acceleration
Tools MCP SDK Model Context Protocol integration

Status

Steuermann is in experimental beta. The core orchestration, memory, RAG pipeline, frontend, and monitoring stack are stable and tested. The following areas are actively evolving:

  • CrewAI crew fine-tuning for production workloads (currently disabled by default)
  • Additional document parsers and ingestion formats
  • Extended MCP tool ecosystem
  • Multi-user workspace features

Yorumlar (0)

Sonuc bulunamadi