Skip to content

feat(overseer): inbox substrate + v0 prioritizer (Step 2.5)#57

Open
heavygee wants to merge 57 commits into
feat/overseer-events-substratefrom
feat/overseer-inbox-substrate
Open

feat(overseer): inbox substrate + v0 prioritizer (Step 2.5)#57
heavygee wants to merge 57 commits into
feat/overseer-events-substratefrom
feat/overseer-inbox-substrate

Conversation

@heavygee

Copy link
Copy Markdown
Owner

Stacked on the events substrate (Step 2). Adds the inbox_items table + promotion job + v0 hand-tuned priority scorer + explain_priority provenance string + read-only inbox viewer. This is the Step 2.5 layer in the Overseer build sequence; it builds on Step 2 (events) and is built on by fix/overseer-inbox-stale-noise (PR #54) and the Step 2.75 replay harness. Soup integrates the full stack; clean upstream PR awaits the stack-wide rebase onto upstream/main (drops garden) via the integration soup process.

Made with Cursor

heavygee and others added 30 commits June 19, 2026 20:33
Soup verify: scratchlist layer already uses @/lib/relative-time; keep
canonical path and avoid TS2300 duplicate identifier at driver merge.
Parallel stress-test stops can race child exit cleanup; returning true when
the session is already gone matches ensure-stopped semantics.
Adds opt-in FCM HTTP v1 notification delivery so a companion mobile/wearable
app can receive permission, ready, and task notifications end-to-end. The
channel is gated entirely on FCM_SERVICE_ACCOUNT_PATH + FCM_PROJECT_ID being
set; operators not running a companion see zero behavior change.

What lands:

- POST/DELETE /api/devices/register — JWT-authed FCM token registry,
  upsert on (namespace, deviceId, platform), platforms `phone` | `wear`.
- Sqlite v9 → v10 migration adds `fcm_devices` (idx on namespace + token).
- FcmService — minimal HTTP v1 client, RS256 service-account JWT via
  jose (dep already in tree), 5-minute access-token cache, 401 retry.
- FcmNotificationChannel — implements NotificationChannel, sends data-only
  FCM (so companion can route to phone+watch surfaces). Body composition
  parses an optional trailing `AGENT_NOTIFY_SUMMARY {json}` line for richer
  ready summaries; truncates plain assistant text to 280 chars otherwise.
  Tags each payload with `severity` (info/warning/success/error) so clients
  can color/categorise the notification.
- PushNotificationChannel gains a NativeFallbackProbe — when a namespace
  has at least one registered FCM device, web-push and SSE in-page toast
  are skipped so the operator does not double-notify on phone+browser.
  Probe is no-op when no FCM device is registered; PWA-only setups
  unchanged. Branch trace gated on HAPI_NOTIFY_DEBUG=1.
- shared/src/messages.ts — `extractAssistantPlainText` (codex + Claude SDK
  shapes) and `extractNotifySummary` (strict end-anchored line parser).
- hub/src/notifications/toolArgs.ts — tool-arg formatters lifted out of
  telegram/sessionView (kept duplicated there in this PR; refactor of
  Telegram is a follow-up).
- docs/api/native-companion-contract.md — payload + endpoints + env vars,
  versioned at contract v1.

Test coverage:

- 260 hub tests pass (incl. 23 new across FCM channel, push dedup,
  v10 migration, devices route).
- 60 shared tests pass (messages parsers).

Notes for reviewers:

- Reference companion implementation lives in a separate Android repo
  (Kotlin, phone APK + Wear OS APK) — this PR is hub-side only.
- No new runtime deps (`jose` and `zod` already declared in hub).

Co-authored-by: Cursor <cursoragent@cursor.com>
…ub-on-phone

Adds a Scope section to the native-companion contract so anyone
implementing it knows the audience: operators running the hub on a
server who want phone/watch as a notification surface, not users
expecting a Termux-bundled hub. Mirrors the framing now in
heavygee/hapi-companion README.

Co-authored-by: Cursor <cursoragent@cursor.com>
Removes the prior framing that referenced a non-existent 'Termux
hub-on-phone' alternative. This contract describes a native client to
the same hub the PWA talks to; it does not change where the hub runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Companion section in Settings renders a QR code encoding the deeplink
hapicompanion://bind?hub=<base>&code=<token>. Scanning it from the HAPI
companion app (Android phone or Wear OS) auto-fills the bind form and
authenticates against this hub - no manual URL/token paste.

QR is gated behind a Show button so the access token doesn't sit visible
on screen by default; a Copy link affordance and the textual deeplink
are also exposed for manual onboarding.

Adds qrcode + @types/qrcode to web/ (already a hub dep, no new resolved
package - just a workspace declaration).

Co-authored-by: Cursor <cursoragent@cursor.com>
After the existing PWA access QR is rendered on tunnel start, also print
the hapicompanion://bind?hub=...&code=... deeplink and a matching QR.

Same tunnel + token, different scheme: phones with the companion app
installed pick up the deeplink via the manifest intent filter; phones
without it ignore it and fall back to the PWA QR above.

QR rendering failure is non-fatal in both cases - the textual deeplink
above the QR is sufficient for manual paste.

Co-authored-by: Cursor <cursoragent@cursor.com>
Two bugs surfaced by the upstream review bot:

1) Web Push silently dropped when FCM is not actually configured.
   The native-fallback probe only checked the device registry; it did
   not check whether resolveFcmConfig() actually succeeded. So an
   operator who previously enabled FCM, registered a phone, then later
   started the hub WITHOUT FCM_SERVICE_ACCOUNT_PATH would see the probe
   return true (devices still in DB) -> Web Push suppressed -> no FCM
   channel registered -> notifications go to /dev/null.

   Fix: extracted the probe construction into buildNativeFallbackProbe()
   which short-circuits to () => false when fcmConfig is missing. Probe
   never even consults the device store in the no-config branch, so
   stale rows can never matter.

2) Transient FCM failures permanently unregistered devices.
   sendToToken() returned a single boolean and sendToNamespace() removed
   any device whose send returned false. A 429 (rate limit), 503
   (server error), 401 (auth glitch), or even an ECONNREFUSED would
   delete the device row, after which the user would need to re-pair to
   get notifications again. The bot caught it; the fix is the obvious
   one.

   Fix: sendToToken() now returns 'sent' | 'invalid' | 'failed'.
   - 'invalid' is reserved for the responses that genuinely indicate a
     dead token: HTTP 404 with UNREGISTERED/NOT_FOUND, and HTTP 400
     with INVALID_ARGUMENT explicitly referencing the token field.
   - Everything else (429, 5xx, 401, 403, network errors) is 'failed'
     and counts toward the failed tally without removing the device.

   sendToNamespace() only calls removeDeviceByToken() on 'invalid'.

Tests: 11 new tests across two new files. fcmService.test.ts covers
all six branches (200, 404 unregistered, 429, 503, 401, network error)
plus a mixed-batch case that proves invalid tokens get removed in the
same call where transient-failure tokens survive. nativeFallbackProbe
.test.ts covers both no-config and configured branches plus the
explicit "no-config never touches the store" guarantee.

Hub test count: 273 -> 284 (all passing).

Co-authored-by: Cursor <cursoragent@cursor.com>
…ent type

HAPI Bot review on PR tiann#803 caught two contract-doc accuracy gaps:

1) Visibility rule was wrong. Doc said "FCM fires when Web Push would
   fire AND client not visible via SSE", but FcmNotificationChannel
   ALWAYS fires regardless of PWA visibility (deliberately - native
   companion is the canonical wrist-first surface, and there is a
   passing test asserting this). Companion app implementers reading
   the contract would have built foreground-suppression logic and
   then dropped notifications when the PWA tab was open.

2) Documented `session-completed` event doesn't exist. NotificationHub
   never calls into a 'session-completed' channel method on
   FcmNotificationChannel; the type would never reach a native client.
   Removed from the documented enum, leaving only the three actual
   events: ready, permission-request, task-notification.

Co-authored-by: Cursor <cursoragent@cursor.com>
…h break

Co-authored-by: Cursor <cursoragent@cursor.com>
…works

The Settings -> Companion pairing QR reads the original CLI access token
from localStorage (hapi_access_token::<baseUrl>) so it can be encoded into
the hapicompanion://bind deeplink. For browser/CLI logins useAuthSource
already persists the token via setAccessToken, but the Telegram Mini App
bind path went through useAuth.bind() which exchanged the typed CLI token
for a JWT and never persisted it. Telegram users therefore always saw the
"signed in via Telegram..." fallback and got no usable QR.

After a successful client.bind() we now mirror useAuthSource's behavior
and write the same accessToken to the same localStorage key, restoring
parity between the two auth paths. No change for browser/CLI users.

Co-authored-by: Cursor <cursoragent@cursor.com>
The native-fallback probe previously returned true whenever FCM was
configured AND devices were registered, which suppressed web-push for
the namespace. The HAPI Bot correctly pointed out the gap: if the FCM
pipeline silently breaks (expired service-account key, sustained 5xx,
OAuth token-fetch failure, network blackhole) the operator gets nothing
on either channel until they manually intervene.

Approach (deliberate, not the bot's exact suggested fix):

- FcmService now keeps a small rolling window (last 8 outcomes) of send
  attempts and exposes `isHealthy()`. The threshold is 5+/8 failures =
  unhealthy; the buffer starts empty so a freshly-booted hub is
  optimistic ("innocent until proven guilty") and does not double-fire
  on event #1.
- Token-fetch failure (`getFcmAccessToken` throws) now records exactly
  one health-failure (not one per device), short-circuits the send
  loop, and returns a result so `sendToNamespace` no longer leaks the
  exception.
- `invalid` token responses are explicitly excluded from the health
  buffer because they are per-device facts (rotated/uninstalled token),
  not pipeline failures - FCM was reachable, it just rejected one
  stale token.
- `buildNativeFallbackProbe` now optionally accepts the FcmService and
  short-circuits to "let web-push fire" when health is bad, before it
  even queries the device registry. The single-arg call shape is still
  supported for back-compat.

Why not the bot's exact suggestion ("invert: call FCM first, fall back
on result.sent === 0"):
- Couples PushNotificationChannel to FcmService and FcmSendPayload,
  reversing the clean parallel-channel architecture established earlier
  in this PR.
- Treats every transient single-event failure as fallback-worthy, which
  re-opens the duplicate-notification race that the suppression logic
  was added to close (FCM HTTP timeout that delivers later + the web
  push we sent in the meantime = two pings).
- A rolling health window only flips on sustained breakage, which is
  the actual operational scenario the bot is worried about.

The wrist-first design intent ("FCM fires unconditionally, web-push is
suppressed for the same namespace") documented in
docs/api/native-companion-contract.md is preserved on the happy path.
The probe only re-enables web-push when there is concrete evidence the
native pipeline is not delivering.

Tests:
- New FcmService.isHealthy suite covers empty-buffer, threshold flip,
  recovery as failures age out of the window, invalid-token exclusion,
  and network-error path.
- nativeFallbackProbe gains coverage for the unhealthy-but-registered,
  healthy-and-registered, and absent-fcmService (back-compat) cases.
- All 292 hub tests still pass; typecheck clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
…dule

The Telegram session view had its own copy of formatToolArgumentsDetailed
identical to the one in hub/src/notifications/toolArgs.ts (already used by
the FCM channel). Replace the local copy with an import.

Removes ~70 lines of duplication, plus the now-unused MAX_TOOL_ARGS_LENGTH
constant and `truncate` import. The shared signature accepts an optional
opts arg whose default maxArgLength is 150 - matching the prior constant -
so the call site is unchanged.

Two benign upgrades come along for the ride from the shared module:
?? instead of || on field fallbacks (no real-world difference; permission
arguments never carry empty-string fields), and String(...) wrapping plus
a typeof object guard that makes non-string values render gracefully
instead of throwing into the catch block.

Hub tests: 311 pass / 0 fail. Telegram subset: 5 pass / 0 fail. typecheck
green.

Cold-reviewed by an out-of-context Claude Opus peer before push.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ng web-push

Addresses HAPI Bot Major review on PR tiann#803.

The previous health gate treated an empty outcome buffer as healthy
("innocent until proven guilty"). That created a silent-blackhole window
on cold start with broken FCM credentials: the push channel suppressed
SSE/Web Push for the first ~5 events while the FCM channel attempted
each delivery and recorded failures, until enough stacked to flip the
threshold. Every notification in that gap was silently lost.

New invariant: isHealthy() requires at least one successful FCM send in
the recent window (HEALTH_WINDOW=8) AND failures below threshold
(HEALTH_FAILURE_THRESHOLD=5). Both conditions are necessary; either
alone is insufficient evidence to safely suppress web-push fallback.

Trade-off: one duplicated notification per hub restart per namespace.
On the first event after restart, web-push fires alongside FCM (because
the gate has no positive evidence yet). Once FCM records that first
success, the gate engages and subsequent events are FCM-only. Worth it
for guaranteed delivery during cold-start outages.

Tests reworked to match new semantics:
- "starts UNHEALTHY with empty buffer" (was: healthy)
- "flips to healthy after first successful send" (new)
- "stays unhealthy across failures-only run" (new, exercises the exact
  blackhole scenario the bot flagged)
- "flips back to unhealthy after threshold breach with prior successes"
  (renamed, establishes successes first)
- "invalid tokens don't count against health" (reworked: send a mixed
  batch first to establish health, then verify invalids don't flip it)
- "network errors count as failures" (reworked: establish health first)

Hub tests: 313 pass / 0 fail. typecheck green.

Co-authored-by: Cursor <cursoragent@cursor.com>
…9→V10

Upstream/main landed sessions.service_tier at schema v10. The companion
FCM device registry now migrates at v11 so both changes compose cleanly
after the courtesy rebase onto current upstream/main.

Co-authored-by: Cursor <cursoragent@cursor.com>
FCM runs before web-push; PushNotificationChannel skips web/SSE only
when the same notify() dispatch already delivered via FCM. Removes the
isHealthy()+device-row probe that could suppress web-push after warm
FCM outages.

Co-authored-by: Cursor <cursoragent@cursor.com>
FcmNotificationChannel now implements the optional sendModelError hook so
NotificationHub can reach the native companion when cursor-agent hits a
model-side failure. Uses shared modelErrorCopy strings, severity=error,
distinct model-error-${sessionId}-${atTs} tags, and always calls deliver()
(wrist-first, no SSE shortcut).

Depends on feat/companion-fcm-push-api and feat/cursor-detect-inline-model-errors
both being lower in the driver soup stack.

Co-authored-by: Cursor <cursoragent@cursor.com>
Rebase follow-up: truncate AGENT_NOTIFY_SUMMARY summary/action before
FCM data payload (bot Major). Fix usePwaUpdate.test.ts setTimeout mock
cast so bun typecheck passes on current main.

Co-authored-by: Cursor <cursoragent@cursor.com>
Whitelist and truncate AGENT_NOTIFY_SUMMARY auxiliary fields before
JSON serialization; cap task-notification summaries to glance limit.

Co-authored-by: Cursor <cursoragent@cursor.com>
10s AbortSignal.timeout on OAuth + FCM send so sequential web-push
fallback is not blocked on hung Google endpoints; truncate Grep/Glob
pattern in permission detail formatter.

Co-authored-by: Cursor <cursoragent@cursor.com>
Delete stale fcm_devices rows sharing the same token when a native
install registers under a different namespace.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add en/zh-CN keys for the Companion section title and CompanionPairing
strings; matches locale-driven Settings pattern (bot Minor on tiann#803).

Co-authored-by: Cursor <cursoragent@cursor.com>
Parse FCM error JSON: only UNREGISTERED or token-field INVALID_ARGUMENT
unregister devices; generic NOT_FOUND stays transient. Guard limit<=3
in truncateReadyText so tiny action budgets cannot blow the glance cap.

Co-authored-by: Cursor <cursoragent@cursor.com>
FCM v1 often returns HTTP 404 with root NOT_FOUND plus
details[].errorCode UNREGISTERED; prune those tokens while keeping
generic project/resource NOT_FOUND transient.

Co-authored-by: Cursor <cursoragent@cursor.com>
heavygee and others added 27 commits June 20, 2026 22:47
…tion

# Conflicts:
#	hub/src/fcm/fcmNotificationChannel.test.ts
…ation

# Conflicts:
#	hub/src/socket/handlers/cli/sessionHandlers.ts
#	shared/src/schemas.ts
# Conflicts:
#	hub/src/store/index.ts
#	hub/src/store/types.ts
Soup-local fixup. Files are unchanged in upstream/main and typecheck
passes on upstream/main, but typecheck on the layered driver/integration
soup produces:

  hub/src/tunnel/tlsGate.ts(70,37): TS2345 string | string[] -> string
  hub/src/web/routes/guards.ts(46,46): TS2345 string | undefined -> string

Likely cause: a layered merge resolves @types/node (or a peer) to a
different transitive version that exposes the wider PeerCertificate.CN
type and the Hono c.req.param() string|undefined return. Both call sites
were already not-quite-safe at runtime - this layer adds the missing
narrowing without changing semantics:
  tlsGate: typeof guard rejects multi-CN certs (was already the
    practical behaviour - dnsNameMatchesHost would have stringified
    the array and never matched any real DNS suffix).
  guards: returns 400 with explicit message when the route param is
    missing, instead of passing undefined through to requireSession
    where it would have produced a confusing 404 cascade.

Not appropriate upstream as-is: upstream/main typecheck is clean
without these changes, and the fixes are paving over a soup-only
type drift, not a real bug in upstream code. Belongs in soup until
the underlying transitive dep / lockfile drift is identified and
either upstreamed or pinned.

Co-authored-by: Cursor <cursoragent@cursor.com>
Soup verify: mark optional Pi model/command fields .optional() so
transformed undefined does not fail object parse under Zod 4.
Soup verify flake: fast in-memory updates can share millisecond with
prior updatedAt; contract is monotonic (>=), not strictly greater.
…tion

# Conflicts:
#	hub/src/fcm/fcmNotificationChannel.test.ts
…ation

# Conflicts:
#	hub/src/socket/handlers/cli/sessionHandlers.ts
#	shared/src/schemas.ts
# Conflicts:
#	hub/src/store/index.ts
#	hub/src/store/types.ts
Persist overseer events in SQLite v11 (events, event_links, FTS5), record
from assistant notify summaries with hub fallbacks, expose GET /api/system-events,
and add a read-only settings debug pane. Includes db-prep v11→v10 downgrade and
Playwright fixture smoke.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Soup stacking: scratchlist and FCM layers also claim v11. Idempotent
CREATE IF NOT EXISTS for all three substrates avoids rerere dropping
tables when overseer merges last.
Events tables must not own SCHEMA_VERSION — ensureOverseerEventsSchema runs
on every Store boot so v11-stamped soup DBs self-heal. Add events/event_links
to REQUIRED_TABLES, fix content-storing FTS delete/update triggers, and extend
db-prep with full soup v11 downgrade plus --drop-overseer-events.

Co-authored-by: Cursor <cursoragent@cursor.com>
events.related_session_id FK blocked DELETE /sessions and reopen merge
(delete old row). Detach on intentional delete; repoint to new session id
on mergeSessions so overseer audit trail survives reopen/resume id swap.

Co-authored-by: Cursor <cursoragent@cursor.com>
…tone (#22)

Events embed payload.session (id, tag, name, project, flavor) at write time
so audit rows stay self-describing after hard-delete. deleteSession snapshots
identity into init-gated deleted_sessions before detach.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add init-gated inbox_items schema, per-session promotion from attention
events, coarse-rank/oldest-within ordering, operator-action logging, REST
+ settings debug pane stacked on the #22 events substrate.

Co-authored-by: Cursor <cursoragent@cursor.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

const events = engine.getSystemEvents({
limit: parsed.data.limit ?? 50,
beforeId: parsed.data.beforeId ?? null,
sessionId: parsed.data.sessionId ?? null,

P1 Badge Scope system event queries by namespace

When an authenticated token belongs to another namespace, this endpoint still calls engine.getSystemEvents without using c.get('namespace') or checking access to sessionId; the store query reads the global events table. That lets any authenticated namespace enumerate event summaries/payloads for other namespaces, or pass another namespace's session id directly. Filter via sessions.namespace or validate the requested session with the existing session access guard before listing.


const items = engine.getInboxItems({
limit: parsed.data.limit ?? 50,
activeOnly,
sessionId: parsed.data.sessionId ?? null
})

P1 Badge Scope inbox item access by namespace

In a multi-namespace hub, this endpoint lists inbox_items globally and the action endpoint below mutates items by global numeric id with no namespace ownership check. Since inbox rows contain titles/summaries and actions change status, a user in one namespace can see or resolve another namespace's attention items. List through the related session's namespace and validate item ownership before recording actions.


metadata: Object.prototype.hasOwnProperty.call(patch, 'metadata')
? toSummaryMetadata(patch.metadata ?? null)
: current.metadata,

P2 Badge Keep summary metadata behind the version gate

After an SSE reconnect, if a REST refetch has already populated the detail cache with a newer metadataVersion, a buffered older metadata patch still reaches patchSessionSummary; these lines replace the list row's metadata before the version check below rejects the stale patch. The result is a corrupted/missing name/path/flavor in the session list until a full refetch. Leave current.metadata here and only assign metadata inside the version-gated block.


path: args.workspacePath ?? '',

P2 Badge Refuse cursor imports without a resume path

When a Cursor import cannot infer workspacePath (for example a legacy chat with no reverse lookup candidate, or an ACP meta without cwd), this writes metadata.path as an empty string and still creates the HAPI row. That row cannot be opened or resumed afterward because SyncEngine.resolveLocalResumeTarget rejects empty paths as Session metadata missing path, leaving the imported Cursor session unusable. Refuse the import until a path is supplied, or persist a real workspace path.


?? metadata.geminiSessionId
?? metadata.opencodeSessionId
?? metadata.cursorSessionId
?? metadata.kimiSessionId
?? undefined,

P2 Badge Preserve Pi session ids in summaries

Pi sessions store their native resume id as metadata.piSessionId, but this summary flattener stops at Kimi. Inactive Pi rows that do not yet have a name or summary therefore have no agentSessionId, so the sidebar's empty-stub filter can hide real resumable Pi sessions and duplicate Pi rows bypass the list dedupe. Include piSessionId in this chain, and mirror it in the SSE summary patch helper.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant