Skip to content

feat(overseer): read-only Overseer entity + 7 query tools + voice route#56

Open
heavygee wants to merge 1 commit into
fix/overseer-inbox-stale-noisefrom
feat/overseer-readonly-entity
Open

feat(overseer): read-only Overseer entity + 7 query tools + voice route#56
heavygee wants to merge 1 commit into
fix/overseer-inbox-stale-noisefrom
feat/overseer-readonly-entity

Conversation

@heavygee

Copy link
Copy Markdown
Owner

Step 3 of the Overseer build sequence: read-only entity + tools + voice route

Stacked on fix/overseer-inbox-stale-noise (events #22 + inbox #23 + stale-noise fix). PR diff is exactly the 12 files of this step.

What this adds

  • Overseer as a real conversational entity in the hub: stable identity, system prompt, dedicated voice surface. Inform-only - no dispatch, no confirm, no state mutation (dispatch-with-confirm is Step 4).
  • 7 read-only tools, all unit-tested against the live substrate:
    • query_events - events stream by session/project/type/severity/time/status/attention_candidate
    • query_inbox - candidates + surfaced + held
    • get_session_state - hub-observed state + last activity + tool-call recency + worker_reported_state
    • get_session_recent_output - last N transcript chunks
    • get_worker_health - combined reported/observed/inferred (contracts §2)
    • explain_priority - provenance trail, reuses the stored reason_for_priority (no reverse-engineering)
    • list_active_workers - roster by project/state/age
  • convo_turn writeback: voice/text conversation turns recorded to the events stream for provenance.
  • Voice route (GET /api/overseer/voice): dedicated Overseer surface that consumes the existing stt/tts voice substrate (feat/overseer-stt-tts-endpoints, feat/overseer-voice-persistence) - does not reimplement it. Chrome-button relocation is Step 5 (out of scope).

Store changes (additive only)

  • queryEvents adds project/sourceKind/severity/time filters; existing query shapes unchanged.
  • Inbox list gains statuses[] + category filters.

Protocol layer

shared/src/overseerEntity.ts: identity, tool catalog with zod arg schemas, system prompt, worker-state derivation (mapNotifyStatusToWorkerState / mapEventTypeToWorkerState / deriveObservedWorkerState / inferWorkerState), convo_turn builder.

Tests

24 new tests pass (protocol unit + OverseerEntity tools + routes). No regressions in existing substrate tests (systemEvents, inboxItems, store).

Known: pre-existing base breakage (not from this PR)

The souped base carries unrelated breakage from other peers' incomplete merges - hub/src/fcm/fcmNotificationChannel.ts imports a missing ../notifications/modelErrorCopy, plus a web/src/hooks/useSSE.ts type error. These are the only bun typecheck/bun run test failures and are outside this PR's surface. My 12 files are type-clean and fully tested.

Gating note

Persona + voice-answer-quality tuning is intentionally not in this PR - it gates on the replay harness (now landed on feat/overseer-replay-harness). This PR is the independent plumbing.

Made with Cursor

Overseer is now a real conversational entity in the hub with a stable
identity, system prompt, and a dedicated voice surface (consumes the
existing stt/tts voice substrate; does not reimplement it). It is
inform-only at this stage - no dispatch, no confirm, no state mutation.

7 read-only tools, all unit-tested against the live substrate:
  query_events, query_inbox, get_session_state, get_session_recent_output,
  get_worker_health, explain_priority (reuses stored reason_for_priority),
  list_active_workers.

Worker health derives reported/observed/inferred state per contracts §2.
Voice conversation turns are written back to the events stream as
event_type=convo_turn for provenance.

Store: additive queryEvents (project/sourceKind/severity/time filters) and
inbox status/category filters - no changes to existing query shapes.

Protocol layer (shared/src/overseerEntity.ts): identity, tool catalog with
zod arg schemas, system prompt, worker-state derivation helpers, convo_turn
builder. 24 new tests pass.

Co-authored-by: Cursor <cursoragent@cursor.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e21a3c584c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +171 to +172
getSession: (sessionId) => this.getSession(sessionId),
getSessions: () => this.getSessions()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope Overseer reads to the caller namespace

When an authenticated client calls any /api/overseer tool, the entity resolves sessions through the global getSession()/getSessions() accessors instead of the JWT namespace. In a multi-namespace hub, list_active_workers enumerates every tenant's sessions, and the returned IDs can then be passed to get_session_recent_output or get_worker_health to read another namespace's transcript and state. Pass c.get('namespace') into tool dispatch and use namespace-scoped accessors or reject cross-namespace session IDs.

Useful? React with 👍 / 👎.

const event = engine.getOverseer().recordConvoTurn({
operatorText: parsed.data.operatorText,
overseerText: parsed.data.overseerText,
relatedSessionId: parsed.data.relatedSessionId ?? null,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate convo-turn session ownership

When a caller includes relatedSessionId, this write path records it without resolving that session against c.get('namespace'), unlike the existing session routes that use requireSessionFromParam. If a namespace A user knows a namespace B session ID, the FK accepts it and the convo_turn is stored under B's session history, polluting another tenant's audit context. Validate ownership before recording the relation or drop it.

Useful? React with 👍 / 👎.

return {
items,
candidates: items.filter((item) => item.status === 'new'),
surfaced: items.filter((item) => item.status === 'surfaced'),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude Overseer turns from worker activity

When an Overseer conversation turn is saved with relatedSessionId, it creates a latest event for that session with sourceKind: 'overseer'. These health/state calculations fetch the latest event without filtering to worker or hub-observed activity, so asking the Overseer about a stale session makes lastActivityAt become the conversation timestamp and silenceMs near zero, masking stale workers in get_session_state, get_worker_health, and list_active_workers. Exclude convo_turn or overseer-sourced events from activity calculations.

Useful? React with 👍 / 👎.

Comment on lines +162 to +163
const lastToolCall = this.events.query({ sessionId, eventType: 'tool_call', limit: 1 })[0]
?? this.events.query({ sessionId, eventType: 'tool_result', limit: 1 })[0]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compare tool call and result timestamps

When a session has any tool_call event, this expression never looks at tool_result, even if the result is newer than the call it completed. lastToolCallAgeMs can therefore report an old start time immediately after fresh tool activity, misleading the Overseer about recency during long-running tools or after a just-finished command. Query both event types together or compare the two timestamps before selecting one.

Useful? React with 👍 / 👎.


getSessionRecentOutput(sessionId: string, n = 10): OverseerRecentOutputChunk[] {
const limit = Math.min(Math.max(n, 1), 50)
const messages = this.messages.getMessages(sessionId, limit)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fetch enough rows before filtering recent output

When the last n stored messages are non-text tool-call/tool-result records, getMessages(sessionId, limit) returns only those rows and the subsequent plain-text filter drops them, so get_session_recent_output returns fewer chunks or an empty list even though earlier recent transcript text exists. Fetch a larger window and then slice after filtering so the tool fulfills its “last N transcript chunks” contract.

Useful? React with 👍 / 👎.

Comment on lines +364 to +365
if (record?.role === 'user' && typeof record.content === 'string') {
return record.content

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse structured user messages as transcript text

When operator messages use the normal web shape written by MessageService.sendMessage ({ role: 'user', content: { type: 'text', text } }), this branch only accepts string content and returns null, so get_session_recent_output drops the operator prompts from context. That leaves the Overseer seeing worker output without the instruction that produced it; extract content.text for text records before skipping the row.

Useful? React with 👍 / 👎.

Comment on lines +230 to +231
if (reported === 'blocked' || reported === 'failed' || reported === 'complete') {
return { state: reported, confidence: 0.9, note: `worker self-reported ${reported}` }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let live pending requests override stale completion reports

When a session has a current permission request, deriveObservedWorkerState returns waiting_on_operator, but an older worker completed event still makes this branch return complete with 0.9 confidence. In that scenario get_worker_health tells the Overseer the worker is complete while there is an active operator decision pending; treat observed === 'waiting_on_operator' as a conflicting current signal before terminal self-reports.

Useful? React with 👍 / 👎.

Comment on lines +100 to +104
let body: unknown
try {
body = await c.req.json()
} catch {
body = {}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject malformed tool request bodies

When JSON parsing fails here, optional-argument tools run with {} instead of rejecting the request. A client that sends a malformed filtered query_events or query_inbox body receives the default unfiltered result set (up to 50 rows), which is surprising and can expose more operational context than intended. Return 400 on invalid JSON as the convo-turn route does.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant