Agentify the status page insights#12
Merged
Merged
Conversation
Ports the CSharp DailyInsightService functionality to a standalone Python FastAPI service. Drives an Anthropic Claude agent through Railtracks with a single `@rt.function_node` tool that calls the Railengine Python SDK directly (replaces the C# MCP attachment). POST /insight returns the same one-line-per- metric plain-text summary so the existing /api/insight card can render it unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t:AgentUrl
When DailyInsight:AgentUrl is set, the 24h loop POSTs to {AgentUrl}/insight
instead of calling Anthropic + MCP inline; Anthropic:ApiKey becomes optional
since the agent owns the LLM call. Leave AgentUrl empty (or unset) to keep the
existing inline path unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ken is set Lets the C# app talk to a daily-insight agent endpoint sitting behind a bearer-auth reverse proxy. Token is read from configuration and attached as Authorization: Bearer <token> on every /insight POST when non-empty; left blank the request goes unauthenticated, matching the existing local-dev behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… env vars Dockerfile installs the daily_insight package via pyproject.toml and runs uvicorn on :8000 — standard container shape for any deploy target. configure_runtime_env bridges LLM_API_KEY <-> ANTHROPIC_API_KEY at startup so the same setting works under either name; the Anthropic SDK still reads ANTHROPIC_API_KEY internally. INSIGHT_MODEL renamed to the conventional LLM_MODEL. README leads with the provider-neutral names and documents the mapping in one row instead of two. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes that together turn opaque 500s into actionable diagnostics: 1. railtownai.init() in the FastAPI lifespan when RAILTOWN_API_KEY is set. The SDK attaches a RailtownHandler to Python's root logger, so any logger.exception/error call downstream ships to Railtown automatically. No-op when the key is unset — agent runs normally. 2. @app.exception_handler(Exception) that logs the traceback and returns a JSON 500 body with the exception type and message instead of FastAPI's default plain-text "Internal Server Error". Replaces the per-endpoint try/except in /insight so unhandled errors from any route get the same treatment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches InsightService from rt.Flow(...).ainvoke() to the rt.Session + rt.call() pattern so we can grab session.payload() and hand it to railtownai.upload_agent_run(). The payload contains the nodes / edges / steps that drive the Railtracks viz UI in Conductr — useful for inspecting which tools the agent called and what prompts it used. Skipped silently when railtownai.init() hasn't run (no RAILTOWN_API_KEY) so local invocations without observability keep working unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ailtracks evaluators
Self-contained smoke test for the deployed agent. Body:
{ "sample_size": 1, "agent_run_id": "<uuid-or-null>" }
sample_size (default 1, capped at 5) drives how many fresh /insight runs the
endpoint generates before scoring. agent_run_id is accepted for forward
compatibility — currently logged and ignored, will identify a historical
session to evaluate in a follow-up.
Three evaluators per call:
- ToolUseEvaluator (free) — checks the "at most 1 get_recent_metrics call" contract
- LLMInferenceEvaluator (free) — checks LLM call latency/tokens/errors
- JudgeEvaluator with two custom Categorical metrics:
FormatCompliance (Compliant / MinorDeviation / MajorDeviation)
FactualGrounding (FullyGrounded / PartiallyGrounded / Hallucinated)
The judge uses AnthropicLLM with the agent's same key (LLM_API_KEY bridge);
override the judge model via EVAL_JUDGE_MODEL. When RAILTOWN_API_KEY is set,
each EvaluationResult uploads to Conductr via railtownai.upload_agent_evaluation
through evals.evaluate's payload_callback hook.
Sessions for extract_agent_data_points are staged in a request-scoped
tempfile.TemporaryDirectory so concurrent /evaluate calls don't see each
other's session payloads. agent_selection=False + agents=[...] keeps
evals.evaluate headless (it would otherwise hang on rich.prompt.Prompt.ask).
Bumps railtracks[visual] to >=1.4.0 for the tool-eval-requires-multiple-
sessions bugfix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 1.4.0 release of railtracks raises ValueError in ToolUseEvaluator when fewer than 2 aggregate nodes exist per tool, which kills /evaluate calls with sample_size=1 (our default). The fix landed on main 2026-06-11 but hasn't shipped to PyPI yet, so we pin to the specific commit via PEP 508's git URL syntax. Swap back to a version range once 1.4.1+ releases. Dockerfile gains a minimal `apt-get install git` step because python:3.10-slim lacks git and pip needs it to clone the pinned commit at build time. Fix being pinned: RailtownAI/railtracks@4e4ed57 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pinned commit installation failed during the ACR build:
error: Multiple top-level packages discovered in a flat-layout:
['pdoc', 'packages'].
The railtracks repo is a monorepo (packages/, pdoc/, docs/, etc. at the
root), so setuptools' flat-layout auto-discovery refuses to guess. The
actual package lives at packages/railtracks/ with its own pyproject.toml
+ src/ layout. PEP 508's #subdirectory= URL fragment tells pip to enter
that subdirectory before running the build.
The PyPI wheel sidesteps this entirely (it's pre-built), so this hint
will go away when we revert to a >=1.4.1 version range.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /evaluate gains a second mode. When the body sets agent_run_id, the service fetches that single session from Conductr's platform API via railtownai.get_agent_runs([str(agent_run_id)]) (new in railtownai 2.0.14), stages the returned payload in the same request-scoped tempdir the fresh mode uses, and runs evaluators against just that session. sample_size is ignored in this mode and insights=[] in the response since no fresh generation happens. Fresh-mode behaviour is unchanged. AgentRunsNotInitializedError / AgentRunFetchError bubble up to the global exception handler and surface in the 500 JSON body — clean signal for the operator when CONDUCTR_PROJECT_PAT or CONDUCTR_PROJECT_ID is missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /evals/ namespace leaves room for future eval-related endpoints (list, retrieve a past evaluation by id, etc.) without crowding the top-level route table. /evals/run is the verb-y entry point that triggers an evaluation; siblings under /evals/ would be CRUD-y reads against persisted results. Wire-level breaking change for the endpoint URL only — request and response shapes are unchanged. The renamed handler is now run_evaluation (the previous `evaluate` name shadowed the imported evals function in some contexts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
railtownai.upload_agent_evaluation catches all non-config exceptions and returns False without raising or logging — HTTP-layer failures (token rejected by Conductr, ingestion endpoint unreachable, rail-engine-ingest errors) look indistinguishable from success at the call site. The SDK also explicitly suppresses rail-engine-ingest INFO logs (so the underlying HTTP response never reaches our root logger). Three changes: 1. Gate on EVALUATIONS_API_TOKEN (the actual prerequisite) instead of railtownai.get_railtown_handler() (which only signals RAILTOWN_API_KEY init — a different feature). 2. Check the return value of upload_agent_evaluation. The previous code logged "uploaded to Conductr" on every call regardless of outcome. 3. Log loudly when the SDK returns False so silent failures become visible in container logs, with diagnostic guidance pointing at the suppressed rail-engine-ingest logger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck hook
railtownai.upload_agent_evaluation is a sync function that drives its async
implementation via asyncio.run(). When invoked from inside FastAPI's running
event loop (which is where evals.evaluate's payload_callback fires from),
asyncio.run() raises RuntimeError — the SDK catches that as a generic
Exception and returns False without surfacing the cause. The unscheduled
coroutine leaks a "coroutine was never awaited" warning to stderr.
Confirmed by container logs after the previous diagnostic-logging commit:
RuntimeWarning: coroutine '_upload_agent_evaluation_async' was never awaited
return False
Fix: drop the payload_callback hook, iterate evaluation_results after
evals.evaluate() returns, and run each upload via asyncio.to_thread so the
SDK gets the clean thread-local state (no running loop) it expects. Batches
the whole list into a single SDK call for efficiency — one ingest session
instead of one per result.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Historical mode now produces names like: daily-insight-<agent-run-uuid>-20260612T233526Z Fresh mode keeps the original timestamp-only shape since there's no single run id to attach (each of the N generated sessions has its own). Makes it trivial to grep Conductr's evaluation list for "did I evaluate this specific session" — previously the only way to correlate was via timestamp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uccesses - hadolint DL3008 on Dockerfile: add ignore directive for the apt-version pin rule. Pinning git to a specific apt version forces a Dockerfile edit on every base-image refresh, which is brittle for a transient dependency that only exists until we move back to a PyPI railtracks release. - black on src/config/env.py: wrap the over-88-char REQUIRED_ENV_VARS list comprehension across multiple lines as black prefers. - black on src/services/evaluation_service.py: the inverse — collapse the asyncio.to_thread call and the logger.info call back to single lines now that they fit under 88 chars. - flake8 E501 on src/agents/insight_agent.py: split the 163-char prompt line at sentence-ish boundaries. Newlines inside a paragraph are semantically equivalent to spaces for the LLM. - flake8 E501 on src/controllers/api.py: wrap the long pydantic Field description= strings using Python implicit string concatenation inside the description=(...) parens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous attempt placed the directive earlier in the comment block, with two more explanatory comment lines between it and the RUN instruction. Hadolint binds the ignore comment to the *next instruction* — intervening comments break that binding, so the warning kept firing in CI. Reorder so the ignore comment is the final comment before RUN; the explanation stays above it in the same block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
railtracks 1.4.1 (PyPI, 2026-06-15) includes the ToolUseEvaluator single-session fix from commit 4e4ed57 (2026-06-11) that we'd been pinning. Reverting to a version range: - pyproject.toml: drop the git+url with #subdirectory hint, restore the simple `railtracks[visual]>=1.4.1` line. - Dockerfile: drop the apt-get install git layer and its hadolint DL3008 ignore directive — git was only there to clone railtracks from source during the pin, no longer needed for a PyPI wheel. Smaller image, faster builds, simpler dep graph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Python/daily-insight/src/controllers/api.py could provide the basis for a kickstart example so would appreciate feedback on that in particular @Amir-R25 . |
jbueza-railtownai
approved these changes
Jun 17, 2026
The Conductr Hosted endpoint now POSTs { agent_run_ids: [...] } so the
agent has to accept a list. Historical mode now fetches all named runs
in one get_agent_runs call and scores them under a single
evaluation_name; the batch name is daily-insight-batch<N>-<ts> when
there's more than one id (single-id keeps the existing id-in-name form).
Adds ConfigDict(extra=forbid) so misspelled keys (e.g. the old singular agent_run_id) return 422 instead of silently dropping into Fresh mode. Caps agent_run_ids at 1..10 ids and adds a model_validator that rejects bodies setting both sample_size and agent_run_ids — uses model_fields_set so the default sample_size=1 doesn't trip the XOR when only agent_run_ids is provided. Plus a black reformat in evaluation_service.py: one stray blank line removed from the import block, one f-string assignment unwrapped from parens.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updated C# app so it can use a remote agent instead of the inline API calls.
Added a Python example agent that uses railtracks and can work as the remote agent for the C# app.