Agentify the status page insights by mbrailtown · Pull Request #12 · RailtownAI/railengine-examples

mbrailtown · 2026-06-12T23:49:09Z

Updated C# app so it can use a remote agent instead of the inline API calls.

Added a Python example agent that uses railtracks and can work as the remote agent for the C# app.

Ports the CSharp DailyInsightService functionality to a standalone Python FastAPI service. Drives an Anthropic Claude agent through Railtracks with a single `@rt.function_node` tool that calls the Railengine Python SDK directly (replaces the C# MCP attachment). POST /insight returns the same one-line-per- metric plain-text summary so the existing /api/insight card can render it unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t:AgentUrl When DailyInsight:AgentUrl is set, the 24h loop POSTs to {AgentUrl}/insight instead of calling Anthropic + MCP inline; Anthropic:ApiKey becomes optional since the agent owns the LLM call. Leave AgentUrl empty (or unset) to keep the existing inline path unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ken is set Lets the C# app talk to a daily-insight agent endpoint sitting behind a bearer-auth reverse proxy. Token is read from configuration and attached as Authorization: Bearer <token> on every /insight POST when non-empty; left blank the request goes unauthenticated, matching the existing local-dev behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… env vars Dockerfile installs the daily_insight package via pyproject.toml and runs uvicorn on :8000 — standard container shape for any deploy target. configure_runtime_env bridges LLM_API_KEY <-> ANTHROPIC_API_KEY at startup so the same setting works under either name; the Anthropic SDK still reads ANTHROPIC_API_KEY internally. INSIGHT_MODEL renamed to the conventional LLM_MODEL. README leads with the provider-neutral names and documents the mapping in one row instead of two. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two changes that together turn opaque 500s into actionable diagnostics: 1. railtownai.init() in the FastAPI lifespan when RAILTOWN_API_KEY is set. The SDK attaches a RailtownHandler to Python's root logger, so any logger.exception/error call downstream ships to Railtown automatically. No-op when the key is unset — agent runs normally. 2. @app.exception_handler(Exception) that logs the traceback and returns a JSON 500 body with the exception type and message instead of FastAPI's default plain-text "Internal Server Error". Replaces the per-endpoint try/except in /insight so unhandled errors from any route get the same treatment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switches InsightService from rt.Flow(...).ainvoke() to the rt.Session + rt.call() pattern so we can grab session.payload() and hand it to railtownai.upload_agent_run(). The payload contains the nodes / edges / steps that drive the Railtracks viz UI in Conductr — useful for inspecting which tools the agent called and what prompts it used. Skipped silently when railtownai.init() hasn't run (no RAILTOWN_API_KEY) so local invocations without observability keep working unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ailtracks evaluators Self-contained smoke test for the deployed agent. Body: { "sample_size": 1, "agent_run_id": "<uuid-or-null>" } sample_size (default 1, capped at 5) drives how many fresh /insight runs the endpoint generates before scoring. agent_run_id is accepted for forward compatibility — currently logged and ignored, will identify a historical session to evaluate in a follow-up. Three evaluators per call: - ToolUseEvaluator (free) — checks the "at most 1 get_recent_metrics call" contract - LLMInferenceEvaluator (free) — checks LLM call latency/tokens/errors - JudgeEvaluator with two custom Categorical metrics: FormatCompliance (Compliant / MinorDeviation / MajorDeviation) FactualGrounding (FullyGrounded / PartiallyGrounded / Hallucinated) The judge uses AnthropicLLM with the agent's same key (LLM_API_KEY bridge); override the judge model via EVAL_JUDGE_MODEL. When RAILTOWN_API_KEY is set, each EvaluationResult uploads to Conductr via railtownai.upload_agent_evaluation through evals.evaluate's payload_callback hook. Sessions for extract_agent_data_points are staged in a request-scoped tempfile.TemporaryDirectory so concurrent /evaluate calls don't see each other's session payloads. agent_selection=False + agents=[...] keeps evals.evaluate headless (it would otherwise hang on rich.prompt.Prompt.ask). Bumps railtracks[visual] to >=1.4.0 for the tool-eval-requires-multiple- sessions bugfix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 1.4.0 release of railtracks raises ValueError in ToolUseEvaluator when fewer than 2 aggregate nodes exist per tool, which kills /evaluate calls with sample_size=1 (our default). The fix landed on main 2026-06-11 but hasn't shipped to PyPI yet, so we pin to the specific commit via PEP 508's git URL syntax. Swap back to a version range once 1.4.1+ releases. Dockerfile gains a minimal `apt-get install git` step because python:3.10-slim lacks git and pip needs it to clone the pinned commit at build time. Fix being pinned: RailtownAI/railtracks@4e4ed57 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The pinned commit installation failed during the ACR build: error: Multiple top-level packages discovered in a flat-layout: ['pdoc', 'packages']. The railtracks repo is a monorepo (packages/, pdoc/, docs/, etc. at the root), so setuptools' flat-layout auto-discovery refuses to guess. The actual package lives at packages/railtracks/ with its own pyproject.toml + src/ layout. PEP 508's #subdirectory= URL fragment tells pip to enter that subdirectory before running the build. The PyPI wheel sidesteps this entirely (it's pre-built), so this hint will go away when we revert to a >=1.4.1 version range. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

POST /evaluate gains a second mode. When the body sets agent_run_id, the service fetches that single session from Conductr's platform API via railtownai.get_agent_runs([str(agent_run_id)]) (new in railtownai 2.0.14), stages the returned payload in the same request-scoped tempdir the fresh mode uses, and runs evaluators against just that session. sample_size is ignored in this mode and insights=[] in the response since no fresh generation happens. Fresh-mode behaviour is unchanged. AgentRunsNotInitializedError / AgentRunFetchError bubble up to the global exception handler and surface in the 500 JSON body — clean signal for the operator when CONDUCTR_PROJECT_PAT or CONDUCTR_PROJECT_ID is missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The /evals/ namespace leaves room for future eval-related endpoints (list, retrieve a past evaluation by id, etc.) without crowding the top-level route table. /evals/run is the verb-y entry point that triggers an evaluation; siblings under /evals/ would be CRUD-y reads against persisted results. Wire-level breaking change for the endpoint URL only — request and response shapes are unchanged. The renamed handler is now run_evaluation (the previous `evaluate` name shadowed the imported evals function in some contexts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

railtownai.upload_agent_evaluation catches all non-config exceptions and returns False without raising or logging — HTTP-layer failures (token rejected by Conductr, ingestion endpoint unreachable, rail-engine-ingest errors) look indistinguishable from success at the call site. The SDK also explicitly suppresses rail-engine-ingest INFO logs (so the underlying HTTP response never reaches our root logger). Three changes: 1. Gate on EVALUATIONS_API_TOKEN (the actual prerequisite) instead of railtownai.get_railtown_handler() (which only signals RAILTOWN_API_KEY init — a different feature). 2. Check the return value of upload_agent_evaluation. The previous code logged "uploaded to Conductr" on every call regardless of outcome. 3. Log loudly when the SDK returns False so silent failures become visible in container logs, with diagnostic guidance pointing at the suppressed rail-engine-ingest logger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ck hook railtownai.upload_agent_evaluation is a sync function that drives its async implementation via asyncio.run(). When invoked from inside FastAPI's running event loop (which is where evals.evaluate's payload_callback fires from), asyncio.run() raises RuntimeError — the SDK catches that as a generic Exception and returns False without surfacing the cause. The unscheduled coroutine leaks a "coroutine was never awaited" warning to stderr. Confirmed by container logs after the previous diagnostic-logging commit: RuntimeWarning: coroutine '_upload_agent_evaluation_async' was never awaited return False Fix: drop the payload_callback hook, iterate evaluation_results after evals.evaluate() returns, and run each upload via asyncio.to_thread so the SDK gets the clean thread-local state (no running loop) it expects. Batches the whole list into a single SDK call for efficiency — one ingest session instead of one per result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Historical mode now produces names like: daily-insight-<agent-run-uuid>-20260612T233526Z Fresh mode keeps the original timestamp-only shape since there's no single run id to attach (each of the N generated sessions has its own). Makes it trivial to grep Conductr's evaluation list for "did I evaluate this specific session" — previously the only way to correlate was via timestamp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…uccesses - hadolint DL3008 on Dockerfile: add ignore directive for the apt-version pin rule. Pinning git to a specific apt version forces a Dockerfile edit on every base-image refresh, which is brittle for a transient dependency that only exists until we move back to a PyPI railtracks release. - black on src/config/env.py: wrap the over-88-char REQUIRED_ENV_VARS list comprehension across multiple lines as black prefers. - black on src/services/evaluation_service.py: the inverse — collapse the asyncio.to_thread call and the logger.info call back to single lines now that they fit under 88 chars. - flake8 E501 on src/agents/insight_agent.py: split the 163-char prompt line at sentence-ish boundaries. Newlines inside a paragraph are semantically equivalent to spaces for the LLM. - flake8 E501 on src/controllers/api.py: wrap the long pydantic Field description= strings using Python implicit string concatenation inside the description=(...) parens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous attempt placed the directive earlier in the comment block, with two more explanatory comment lines between it and the RUN instruction. Hadolint binds the ignore comment to the *next instruction* — intervening comments break that binding, so the warning kept firing in CI. Reorder so the ignore comment is the final comment before RUN; the explanation stays above it in the same block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

railtracks 1.4.1 (PyPI, 2026-06-15) includes the ToolUseEvaluator single-session fix from commit 4e4ed57 (2026-06-11) that we'd been pinning. Reverting to a version range: - pyproject.toml: drop the git+url with #subdirectory hint, restore the simple `railtracks[visual]>=1.4.1` line. - Dockerfile: drop the apt-get install git layer and its hadolint DL3008 ignore directive — git was only there to clone railtracks from source during the pin, no longer needed for a PyPI wheel. Smaller image, faster builds, simpler dep graph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mbrailtown · 2026-06-17T18:57:40Z

Python/daily-insight/src/controllers/api.py could provide the basis for a kickstart example so would appreciate feedback on that in particular @Amir-R25 .

The Conductr Hosted endpoint now POSTs { agent_run_ids: [...] } so the agent has to accept a list. Historical mode now fetches all named runs in one get_agent_runs call and scores them under a single evaluation_name; the batch name is daily-insight-batch<N>-<ts> when there's more than one id (single-id keeps the existing id-in-name form).

Adds ConfigDict(extra=forbid) so misspelled keys (e.g. the old singular agent_run_id) return 422 instead of silently dropping into Fresh mode. Caps agent_run_ids at 1..10 ids and adds a model_validator that rejects bodies setting both sample_size and agent_run_ids — uses model_fields_set so the default sample_size=1 doesn't trip the XOR when only agent_run_ids is provided. Plus a black reformat in evaluation_service.py: one stray blank line removed from the import block, one f-string assignment unwrapped from parens.

mbrailtown and others added 17 commits June 12, 2026 11:04

mbrailtown assigned jbueza-railtownai and Amir-R25 Jun 17, 2026

jbueza-railtownai approved these changes Jun 17, 2026

View reviewed changes

mbrailtown added 2 commits June 19, 2026 11:49

mbrailtown merged commit cc8608f into main Jun 19, 2026
13 checks passed

mbrailtown deleted the matthew/csharp-example branch June 19, 2026 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agentify the status page insights#12

Agentify the status page insights#12
mbrailtown merged 19 commits into
mainfrom
matthew/csharp-example

mbrailtown commented Jun 12, 2026

Uh oh!

mbrailtown commented Jun 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mbrailtown commented Jun 12, 2026

Uh oh!

mbrailtown commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mbrailtown commented Jun 17, 2026 •

edited

Loading