feat(hub): native companion (FCM) push channel + device registry + pairing QR by heavygee · Pull Request #803 · tiann/hapi

heavygee · 2026-06-04T17:06:20Z

Summary

Adds an opt-in, contract-only path for native companion apps (Android phone + Wear OS) to register for FCM push and receive the same notifications HAPI already sends via Web Push - without changing default behavior for any existing user.

Companion to discussion #746.

The companion app source itself is not in this PR; it lives at heavygee/hapi-companion (currently dogfood, eventual home is wherever the maintainer prefers - happy to transfer / mirror). This PR is just the hub-side surface the app talks to, plus a PWA Settings UI + terminal QR for pairing.

Hub additions

POST /api/devices/register + DELETE /api/devices/register - FCM token registration scoped per (namespace, deviceId, platform). platform is phone or wear.
hub/src/fcm/ - optional FcmService + FcmNotificationChannel, gated entirely by FCM_SERVICE_ACCOUNT_PATH env. If unset, no FCM code paths run, no behavior changes.
Schema migration to v10 adds fcm_devices table (id, namespace, token, platform, deviceId, timestamps, unique on (namespace, deviceId, platform)).
shared/messages.ts - extractAssistantPlainText() and extractNotifySummary() helpers used by the FCM channel to build human-readable notification bodies (and parse optional AGENT_NOTIFY_SUMMARY {...} JSON tail for richer summaries when agents emit it).
docs/api/native-companion-contract.md - documents the data field schema, REST endpoints, severity values, and env knobs.

PWA additions

Settings -> Companion section renders a QR code encoding hapicompanion://bind?hub=<base>&code=<token>. QR is gated behind a Show button so the access token isn't always on screen. Copy-link + textual deeplink are also exposed.
Hub terminal startup prints both QRs side by side: existing PWA QR (unchanged) plus a new companion deeplink QR. PWA users ignore the second; native users with the app installed onboard from it via Android intent filter.
Adds qrcode to web/ (already a hub dep; bun lockfile diff is just the workspace declaration, no new resolved packages).

Behavior gates / opt-in story

User state	What changes
No FCM env, no companion app	Nothing. Web Push and SSE work as today.
FCM env set, no companion registered	Nothing. FCM channel sees no devices and no-ops.
FCM env set, companion registered for namespace X	FCM fires for X. Web Push is suppressed for that namespace (avoids duplicate notifications on phone + watch). Other namespaces are unaffected.

The Web Push suppression is the only existing-user-visible change, and only triggers when an FCM device is registered. Happy to gate it behind an additional config flag (e.g. HAPI_PUSH_PREFER_NATIVE) if you'd prefer Web Push remain literally untouched until the operator opts in twice. Disclosed up front so it's not a surprise.

What this PR is not

Not the companion app itself (separate repo).
No Firebase service account keys, no Play Store config, no signed APKs - secrets / credentials are entirely the maintainer's call when (if) you adopt this.
No mandatory dependency on hapi.run infrastructure - self-hosters provision their own Firebase project; the env-gated FCM_SERVICE_ACCOUNT_PATH points at it. Distribution-tier discussion lives in the companion repo's docs/distribution-spectrum.md.

Test plan

bun run typecheck passes (hub, web, shared)
bun run test passes - 273 hub tests, 736 web tests, no regressions
Migration v10 unit-tested (hub/src/store/migration-v10.test.ts)
Device routes unit-tested (hub/src/web/routes/devices.test.ts)
FCM channel unit-tested (hub/src/fcm/fcmNotificationChannel.test.ts)
Push channel suppression unit-tested (hub/src/push/pushNotificationChannel.test.ts)
extractAssistantPlainText + extractNotifySummary unit-tested (shared/src/messages.test.ts)
Real-device dogfood: Pixel 10a + Pixel Watch, FCM ready / permission / task notifications observed end-to-end via the companion app for ~3 days
Reviewer smoke: with FCM_SERVICE_ACCOUNT_PATH unset, no FCM code paths execute (please confirm at your leisure)

Companion app status

Wear OS app: notification tap opens session, Allow/Deny/Reply work via phone bridge, "Open on phone" launches the matching session URL, severity-based notification colors, payload-shrink filter for the 100KB Wearable Data Layer cap.
Phone app: FCM relay to watch, deeplink-based pairing, dark Material3 theme.
Currently versioned 0.3.8 (phone) / 0.5.14 (wear), dogfood-only, no Play Store listing.

Going forward (out of PR scope, your call)

Whether the Play Store binary you ship supports Tier 1 (BYO-Firebase via pairing QR) and/or Tier 2 (relay through hapi.run) for self-hosters - tier doc linked above.
Repo transfer / collab arrangement for heavygee/hapi-companion. You're already an admin collaborator; happy to transfer ownership when you're ready and stay on as a committer.

github-actions

Findings

[Major] Web push is suppressed even when no FCM channel is active — nativeFallbackProbe returns true solely from stored device rows, but the FCM notification channel is only registered when resolveFcmConfig() succeeds. If an operator has an fcm_devices row and then runs the hub without FCM_SERVICE_ACCOUNT_PATH/project config, PushNotificationChannel returns early and no FCM channel exists to deliver anything, so ready/permission/task notifications disappear. Evidence hub/src/startHub.ts:206.
Suggested fix:
```
const fcmConfig = resolveFcmConfig()
const nativeFallbackProbe = (namespace: string): boolean =>
    Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0

const notificationChannels: NotificationChannel[] = [
    new PushNotificationChannel(
        pushService,
        sseManager,
        visibilityTracker,
        config.publicUrl,
        nativeFallbackProbe
    )
]
```

Summary

Review mode: initial
Found one notification delivery regression in the FCM/web-push fallback interaction.

Testing

Not run (automation)

HAPI Bot

github-actions

Findings

[Major] Web push is suppressed when FCM is not configured — nativeFallbackProbe only checks stored device rows, but the FCM channel is registered only when resolveFcmConfig() succeeds. A stale companion row plus a hub started without FCM_SERVICE_ACCOUNT_PATH makes PushNotificationChannel return before SSE/web-push while no FCM channel exists to deliver the notification. Evidence hub/src/startHub.ts:207.
Suggested fix:
```
const fcmConfig = resolveFcmConfig()
const nativeFallbackProbe = (namespace: string): boolean =>
    Boolean(fcmConfig) && store.fcm.getDevicesByNamespace(namespace).length > 0
```
[Major] Transient FCM failures delete registered devices — sendToToken() returns false for every non-OK response, but sendToNamespace() treats every false as an invalid token and removes it from fcm_devices. A 429/500/503 from Google or a temporary auth/project problem can permanently unregister the phone/watch, after which web-push also stays suppressed because registration state changed unexpectedly. Evidence hub/src/fcm/fcmService.ts:73.
Suggested fix:
```
type FcmTokenSendResult = 'sent' | 'invalid' | 'failed'

const result = await this.sendToToken(accessToken, device.token, payload, device.platform)
if (result === 'sent') {
    sent += 1
    return
}
failed += 1
if (result === 'invalid') {
    invalidTokens.push(device.token)
    this.store.fcm.removeDeviceByToken(namespace, device.token)
}
```

Summary

Review mode: follow-up after new commits
Prior bot finding is still present at the current head, and the FCM send path adds a separate device-registration data-loss risk.

Testing

Not run (automation). Missing coverage for stale fcm_devices rows when FCM config is absent, and for retaining devices on transient FCM 429/5xx responses.

HAPI Bot

Two bugs surfaced by the upstream review bot: 1) Web Push silently dropped when FCM is not actually configured. The native-fallback probe only checked the device registry; it did not check whether resolveFcmConfig() actually succeeded. So an operator who previously enabled FCM, registered a phone, then later started the hub WITHOUT FCM_SERVICE_ACCOUNT_PATH would see the probe return true (devices still in DB) -> Web Push suppressed -> no FCM channel registered -> notifications go to /dev/null. Fix: extracted the probe construction into buildNativeFallbackProbe() which short-circuits to () => false when fcmConfig is missing. Probe never even consults the device store in the no-config branch, so stale rows can never matter. 2) Transient FCM failures permanently unregistered devices. sendToToken() returned a single boolean and sendToNamespace() removed any device whose send returned false. A 429 (rate limit), 503 (server error), 401 (auth glitch), or even an ECONNREFUSED would delete the device row, after which the user would need to re-pair to get notifications again. The bot caught it; the fix is the obvious one. Fix: sendToToken() now returns 'sent' | 'invalid' | 'failed'. - 'invalid' is reserved for the responses that genuinely indicate a dead token: HTTP 404 with UNREGISTERED/NOT_FOUND, and HTTP 400 with INVALID_ARGUMENT explicitly referencing the token field. - Everything else (429, 5xx, 401, 403, network errors) is 'failed' and counts toward the failed tally without removing the device. sendToNamespace() only calls removeDeviceByToken() on 'invalid'. Tests: 11 new tests across two new files. fcmService.test.ts covers all six branches (200, 404 unregistered, 429, 503, 401, network error) plus a mixed-batch case that proves invalid tokens get removed in the same call where transient-failure tokens survive. nativeFallbackProbe .test.ts covers both no-config and configured branches plus the explicit "no-config never touches the store" guarantee. Hub test count: 273 -> 284 (all passing). Co-authored-by: Cursor <cursoragent@cursor.com>

heavygee · 2026-06-05T00:06:38Z

Both findings addressed in 078ec8e:

Finding 1 (web-push gated on fcmConfig): extracted the probe construction into buildNativeFallbackProbe(). When fcmConfig is null the function short-circuits to () => false and never consults the device store, so stale rows from a prior FCM-enabled boot can never silently suppress web-push.

Finding 2 (transient FCM errors no longer unregister devices): sendToToken() now returns 'sent' | 'invalid' | 'failed'. Only 'invalid' (HTTP 404 with UNREGISTERED/NOT_FOUND, or 400 INVALID_ARGUMENT against the token field) calls removeDeviceByToken(). 429 / 5xx / 401 / 403 / network errors are now 'failed' - they count toward the fail tally without dropping the registration.

11 new tests across fcmService.test.ts and nativeFallbackProbe.test.ts covering both fixes, including a mixed-batch case that proves invalid tokens get removed in the same call where transient-failure tokens survive. Hub test count 273 → 284, all passing.

Thanks to the bot for catching both - especially the device-unregistration one, which would have been a really mysterious "I have to re-pair every couple of days" bug in production.

github-actions

Findings

[Minor] Contract documents the wrong FCM visibility rule — the docs say FCM only sends when web push would fire and the PWA is not visible, but FcmNotificationChannel.deliver() always calls FCM and the new test asserts delivery even with a visible PWA. Companion apps built from this contract can suppress foreground handling and miss notifications. Evidence docs/api/native-companion-contract.md:46.
Suggested fix:
```
Hub sends FCM HTTP v1 whenever a notification event is emitted for a namespace with registered native devices and FCM is configured. Web Push is suppressed for that namespace to avoid duplicate OS notifications.
```
[Minor] Contract advertises an unsupported session-completed event — NotificationHub only calls channels with sendSessionCompletion, and the new FcmNotificationChannel does not implement it, so native companions will never receive the documented session-completed type. Evidence docs/api/native-companion-contract.md:52.
Suggested fix:
```
| `type` | `ready` | `ready`, `permission-request`, `task-notification` |
```

Summary

Review mode: follow-up after new commits
Prior bot findings are addressed at the current head. Remaining issues are contract accuracy gaps in the new native companion docs.

Testing

Not run (automation): bun is not installed in this runner (bun: command not found).

HAPI Bot

…ent type HAPI Bot review on PR tiann#803 caught two contract-doc accuracy gaps: 1) Visibility rule was wrong. Doc said "FCM fires when Web Push would fire AND client not visible via SSE", but FcmNotificationChannel ALWAYS fires regardless of PWA visibility (deliberately - native companion is the canonical wrist-first surface, and there is a passing test asserting this). Companion app implementers reading the contract would have built foreground-suppression logic and then dropped notifications when the PWA tab was open. 2) Documented `session-completed` event doesn't exist. NotificationHub never calls into a 'session-completed' channel method on FcmNotificationChannel; the type would never reach a native client. Removed from the documented enum, leaving only the three actual events: ready, permission-request, task-notification. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Nit] Remove trailing whitespace — git diff --check origin/main...HEAD fails because the new Audience line ends with Markdown hard-break spaces. Evidence docs/api/native-companion-contract.md:3.
Suggested fix:

**Audience:** Implementers of native companion apps (Android phone + Wear OS, iOS, etc.) that pair with a hapi hub via FCM.
**Auth:** Same JWT as the web client (`POST /api/bind` → `Authorization: Bearer`).

Summary
Review mode: follow-up after new commits
Prior bot findings are addressed at the current head. One whitespace issue remains from the latest full diff review.

Testing
Not run (automation): bun is not installed in this runner (command -v bun returned non-zero). Static check run: git diff --check origin/main...HEAD fails as noted above.

HAPI Bot

github-actions

Findings

[Major] Persist bind token before rendering the pairing QR — CompanionPairing only builds the deeplink from hapi_access_token::<baseUrl> in localStorage, but Telegram binding authenticates via bind() and never stores the CLI token entered on the bind screen (web/src/App.tsx:364, web/src/hooks/useAuth.ts:147). A Telegram user can successfully bind the Mini App, open Settings, and still hit the “signed in via Telegram...” fallback, so the new Settings companion pairing QR is unavailable for a primary entry path. Evidence web/src/components/settings/CompanionPairing.tsx:31.
Suggested fix:
```
// web/src/hooks/useAuth.ts
const ACCESS_TOKEN_PREFIX = 'hapi_access_token::'

function rememberAccessToken(baseUrl: string, accessToken: string): void {
    try {
        localStorage.setItem(`${ACCESS_TOKEN_PREFIX}${baseUrl}`, accessToken)
    } catch {
        // Ignore storage errors
    }
}

// inside bind(), after successful client.bind(...)
rememberAccessToken(baseUrl, accessToken)
```

Summary
Review mode: follow-up after new commits
Prior whitespace finding is fixed at the current head. One functional issue remains in the Settings companion pairing flow.

Testing
Not run (automation): bun is not installed in this runner (command -v bun returned no path). Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

github-actions

Findings

[Major] Do not suppress Web Push before FCM actually succeeds — the new native fallback probe skips the PWA/SSE path whenever a device row exists, but FCM delivery can still fail transiently or during OAuth token exchange. FcmService.sendToNamespace() explicitly treats 401/429/5xx/network errors as failures while keeping the device registered, and getFcmAccessToken() can throw before any send. In those cases this branch returns before Web Push/SSE runs, so permission/ready/task notifications disappear for the namespace. Evidence hub/src/push/pushNotificationChannel.ts:124, related FCM failure paths hub/src/fcm/fcmService.ts:70 and hub/src/fcm/fcmService.ts:144.
Suggested fix:
```
// Make native delivery own the fallback decision instead of suppressing
// Web Push from a separate channel based only on registration.
private async sendNativeFirst(
    session: Session,
    fcmPayload: FcmSendPayload,
    sendWebFallback: () => Promise<void>
): Promise<void> {
    const result = await this.fcmService.sendToNamespace(session.namespace, fcmPayload)
        .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] }))

    if (result.sent === 0) {
        await sendWebFallback()
    }
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
One Major issue remains: native registration can suppress all web notification fallbacks even when FCM does not deliver. Prior Telegram bind-token finding appears fixed in web/src/hooks/useAuth.ts.

Testing

Not run (automation): bun is not installed in this runner.
Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

github-actions

Findings

[Major] FCM health gate still drops first failed native notifications — PushNotificationChannel returns before the FCM channel has attempted delivery, while the new health gate starts healthy and stays healthy until five recorded failures. After a hub restart or credential/OAuth/network outage, the first several permission/ready/task notifications for a namespace with a registered device still skip SSE/Web Push and then fail in FCM, so the operator receives nothing. Evidence hub/src/push/pushNotificationChannel.ts:76, related health threshold hub/src/fcm/fcmService.ts:85.
Suggested fix:
```
private async deliverNativeFirst(
    session: Session,
    webPayload: PushPayload,
    fcmPayload: FcmSendPayload
): Promise<void> {
    const result = await this.fcmService
        .sendToNamespace(session.namespace, fcmPayload)
        .catch(() => ({ sent: 0, failed: 1, invalidTokens: [] }))

    if (result.sent > 0) return
    await this.deliverWebOrToast(session, webPayload)
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
One Major issue remains: the new health probe reduces prolonged notification loss, but it still suppresses web/SSE fallback for the first failed FCM notifications after startup or a fresh outage.

Testing

Not run (automation): bun is not installed in this runner.
Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

heavygee · 2026-06-05T01:30:23Z

Status on `c0217fc`: both CI checks green; one open design disagreement with the bot

pr-review ✓ and test ✓ on the latest head. The remaining bot Major (#3 in this thread) is a real design trade-off rather than a code-level bug, and I'd like @tiann's call before iterating further.

What the bot wants

Invert the channel: PushNotificationChannel calls fcmService.sendToNamespace() first and falls back to web-push when result.sent === 0.

Why I implemented a health-gated probe instead

The bot's exact fix has a subtle gap that the rolling health gate is also subject to, just with different timing:

Scenario	Bot's fix (`sent === 0` -> fallback)	Health probe (`5+/8 failures` -> fallback)	Current PR (probe + health)
FCM happy path	✓ no duplicates	✓ no duplicates	✓ no duplicates
FCM 503 burst (transient)	every event misses+fallback fires; if FCM later catches up = duplicate	first ~5 events missed, then web-push covers	first ~5 events missed, then web-push covers
FCM credentials expired	every event has fallback; but if FCM HTTP times out and Google retries it later -> duplicate	first ~5 events missed, then web-push covers	first ~5 events missed, then web-push covers
FCM cold start, no history	always tries FCM first (sequential) -> latency on every event	optimistic, no first-event penalty	optimistic, no first-event penalty
Architectural cost	`PushNotificationChannel` now imports `FcmService` + `FcmSendPayload`; channels lose independence	one shared `isHealthy()` signal; channels stay parallel	one shared `isHealthy()` signal; channels stay parallel

There is no design that gives both "no first-failure missed" and "zero duplicates on FCM HTTP-timeout-then-retry-delivers." The wrist-first contract documented in docs/api/native-companion-contract.md was an explicit choice to prioritise no-duplicates over no-misses. The health probe softens the "credentials silently expired" worst case (which was the bot's original concern) without giving up the duplicate-free guarantee on the happy path.

Two questions for @tiann

Are you OK with the current trade-off (~5 missed events on a sustained outage, never duplicate on transient timing weirdness), or would you rather take the bot's invert-and-fall-back approach (no first-event miss, occasional duplicate on FCM-timeout-then-Google-retries)?
If you prefer the latter, happy to refactor channel orchestration up to NotificationHub so PushNotificationChannel doesn't grow an import on FcmService.

CI is green either way. Defer to your call.

heavygee · 2026-06-05T21:53:27Z

Status on `13d69f3` (merge with base)

Merge with tiann/hapi:main: clean (single 4-line import-block conflict in web/src/routes/settings/index.tsx, both sides additive, kept the union).
Mergeable state: MERGEABLE / UNSTABLE (no real blocker).
test check: GREEN. Locally: hub 303/303, web 761/761, shared 91/91, both typechecks clean.
pr-review check (Codex): FAILURE on infra, not on code. Three consecutive runs all died with the same upstream error:
```
unexpected status 530 ... error code: 1033, url: http://127.0.0.1:.../v1/responses, cf-ray: ...
Error: codex exited with code 1
```
That's a Cloudflare 1033 (Argo tunnel) on OpenAI's Codex backend - the action never reached its model. Re-running just gets a fresh cf-ray, same outcome. Worth waiting for OpenAI to recover, or running once more later.

The previous bot run on c0217fc (before this merge) is the last successful one - findings carried forward to that comment thread; nothing new on the merge other than what git diff shows for the upstream catch-up (voice picker, session export, codex import filter, release 0.20.0). The FCM-side code on this branch is unchanged from c0217fc.

github-actions

Findings

[Major] Native fallback still drops notifications on initial FCM failures — PushNotificationChannel suppresses both SSE toast and Web Push as soon as the native probe returns true, before the FCM channel has attempted this notification. The probe starts healthy and remains healthy until five failures, so after a hub restart with bad credentials/OAuth/network outage, the first failed ready/permission/task notifications still return here and FCM only records the failure later in the next channel pass. Evidence hub/src/push/pushNotificationChannel.ts:76, related health gate hub/src/fcm/fcmService.ts:85 and channel order hub/src/startHub.ts:227.
Suggested fix:
```
type NativeDelivery = (namespace: string, payload: PushPayload) => Promise<{ sent: number }>

private async deliverNativeFirst(
    session: Session,
    webPayload: PushPayload,
    nativePayload: PushPayload,
    deliverNative?: NativeDelivery
): Promise<void> {
    if (deliverNative) {
        const result = await deliverNative(session.namespace, nativePayload).catch(() => ({ sent: 0 }))
        if (result.sent > 0) return
    }

    await this.deliverWebOrToast(session, webPayload)
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
One Major issue remains from the prior bot pass: the health probe helps after enough FCM failures have been recorded, but the current push channel can still mute all non-native surfaces for the first failed notifications after startup or a fresh outage.

Testing

Not run (automation): bun is not available in this runner PATH.
Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

github-actions

Findings

[Major] Native fallback still drops notifications on initial FCM failures — PushNotificationChannel suppresses SSE/Web Push before the FCM channel attempts the same notification. The probe is treated as healthy with an empty failure window, so after a hub restart with bad credentials/OAuth/network outage, this branch returns first and the later FCM channel only records the failure after all user-visible fallback surfaces have already been skipped. Evidence hub/src/push/pushNotificationChannel.ts:76, related channel order hub/src/startHub.ts:227 and empty-buffer health gate hub/src/fcm/fcmService.ts:85.
Suggested fix:
```
type NativeDelivery = (namespace: string, payload: FcmSendPayload) => Promise<{ sent: number }>

private async deliverNativeFirst(
    session: Session,
    nativePayload: FcmSendPayload,
    webPayload: PushPayload,
    deliverNative?: NativeDelivery
): Promise<void> {
    if (deliverNative) {
        const result = await deliverNative(session.namespace, nativePayload).catch(() => ({ sent: 0 }))
        if (result.sent > 0) return
    }

    await this.deliverWebOrToast(session, webPayload)
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
One Major issue remains from the prior bot pass: the new health probe helps after enough failures have been recorded, but it still mutes all non-native surfaces for the first failed native notifications after startup or a fresh outage.

Testing

Not run (automation/security policy: no PR code execution). Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

…ng web-push Addresses HAPI Bot Major review on PR tiann#803. The previous health gate treated an empty outcome buffer as healthy ("innocent until proven guilty"). That created a silent-blackhole window on cold start with broken FCM credentials: the push channel suppressed SSE/Web Push for the first ~5 events while the FCM channel attempted each delivery and recorded failures, until enough stacked to flip the threshold. Every notification in that gap was silently lost. New invariant: isHealthy() requires at least one successful FCM send in the recent window (HEALTH_WINDOW=8) AND failures below threshold (HEALTH_FAILURE_THRESHOLD=5). Both conditions are necessary; either alone is insufficient evidence to safely suppress web-push fallback. Trade-off: one duplicated notification per hub restart per namespace. On the first event after restart, web-push fires alongside FCM (because the gate has no positive evidence yet). Once FCM records that first success, the gate engages and subsequent events are FCM-only. Worth it for guaranteed delivery during cold-start outages. Tests reworked to match new semantics: - "starts UNHEALTHY with empty buffer" (was: healthy) - "flips to healthy after first successful send" (new) - "stays unhealthy across failures-only run" (new, exercises the exact blackhole scenario the bot flagged) - "flips back to unhealthy after threshold breach with prior successes" (renamed, establishes successes first) - "invalid tokens don't count against health" (reworked: send a mixed batch first to establish health, then verify invalids don't flip it) - "network errors count as failures" (reworked: establish health first) Hub tests: 313 pass / 0 fail. typecheck green. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Native fallback still drops notifications during a fresh FCM outage after a prior success — PushNotificationChannel suppresses SSE/Web Push before the FCM send for the same notification. The new cold-start guard helps while FcmService has no successes, but after any successful send, isHealthy() stays true until five failures accumulate, so the first failed permission/ready/task notifications in a new OAuth/network/5xx outage are skipped by web-push at hub/src/push/pushNotificationChannel.ts:76 and then fail in FCM. Evidence: hub/src/push/pushNotificationChannel.ts:76, hub/src/fcm/fcmService.ts:99.
Suggested fix:
```
type NativeDelivery = (namespace: string, payload: FcmSendPayload) => Promise<{ sent: number }>

private async deliverNativeFirst(
    session: Session,
    nativePayload: FcmSendPayload,
    webPayload: PushPayload,
    deliverNative?: NativeDelivery
): Promise<void> {
    if (deliverNative) {
        const result = await deliverNative(session.namespace, nativePayload).catch(() => ({ sent: 0 }))
        if (result.sent > 0) return
    }

    await this.deliverWebOrToast(session, webPayload)
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
One Major issue remains: the cold-start gap from the previous pass is covered, but the same pre-send suppression still drops notifications during a fresh FCM outage after the service has previously been healthy.

Testing

Not run (automation environment: bun is not installed). Static check: git diff --check origin/main...HEAD passed.

HAPI Bot

heavygee · 2026-06-08T00:55:32Z

Status on `80109354`: cold-start gap closed; warm-outage gap is the same probe-vs-invert call

pr-review ✓ and test ✓ on the latest head. CI is green; mergeStateStatus: CLEAN.

What changed since `c0217fc`

Two follow-up commits:

eda8c900 - drops a duplicate formatToolArgumentsDetailed in hub/src/telegram/sessionView.ts (cold-reviewed by an out-of-context Claude Opus peer; +2/-76 lines, behaviourally equivalent).
80109354 - addresses the bot's previous cold-start Major: isHealthy() now requires positive evidence in the recent window, not just sub-threshold failures. Empty buffer or failures-only buffer renders unhealthy. Closes the silent-blackhole-on-broken-cold-start window the bot identified, at a cost of one duplicated notification per hub restart per namespace.

What the bot says about `80109354`

The bot acknowledges the cold-start gap is covered ("the cold-start gap from the previous pass is covered") and has narrowed its Major to: after a prior success, a fresh OAuth/network/5xx outage will silently drop the first ~4 notifications before the failure threshold trips.

That's true. It's the same probe-vs-invert architectural trade-off from my previous comment, now scoped to warm-outage rather than cold-start. The choices are still:

Keep the probe: occasional ~4-event silent gap during sustained warm outages; zero spurious duplicates on transient blips.
Lower the threshold: shorter silent gap; more spurious duplicates on every transient blip.
Invert channels (bot's suggested fix): no silent gap; couples PushNotificationChannel synchronously to FcmService; rare duplicate when FCM HTTP times out and Google retries later.

Position unchanged: this is your call. CI is green either way. Happy to do the invert if you want it; otherwise the current shape is what's on the branch.

UI-only banner + amber pulse only fire for operators who happen to be looking at the web UI. The whole point of model-error detection is to catch the scenario where the agent reports "all done" with a green dot, but actually hit a quota / transport / RPC failure mid-turn -- exactly the moment the operator walks away. Wire model errors into the existing NotificationHub so they ping the operator's phone / Telegram / wrist *regardless* of whether the web UI is in focus. Plumbing: - ModelErrorNotification type carrying kind, transient, rawSnippet, priorAssistantClaimsDone, and atTs (used as dedup key). - Optional sendModelError on NotificationChannel (mirrors sendSessionCompletion's optional shape; channels that don't render error variants just no-op). - NotificationHub subscribes to session-updated; tracks last-notified atTs per session so we fire once per distinct error event, dedupes repeat session-updated events, skips already-acknowledged errors. Channel implementations (this branch): - Push (VAPID web push): skips the in-page-toast shortcut so a backgrounded operator gets a real system-tray ping. Tag includes atTs so distinct errors don't overwrite. - Telegram: leading siren + bold "Model error - <kind>" so the chat glance is visually distinct from ready/task copy. Shared copy helper (modelErrorCopy.ts) gives every channel the same kind-aware title and prior-claims-done warning body, so wrist / browser toast / Telegram all read consistently. FCM (companion app) sendModelError is added in a follow-up, since FCM lives on a separate open upstream PR (tiann#803) and we don't want to entangle the two branches. Co-authored-by: Cursor <cursoragent@cursor.com>

Two bugs surfaced by the upstream review bot: 1) Web Push silently dropped when FCM is not actually configured. The native-fallback probe only checked the device registry; it did not check whether resolveFcmConfig() actually succeeded. So an operator who previously enabled FCM, registered a phone, then later started the hub WITHOUT FCM_SERVICE_ACCOUNT_PATH would see the probe return true (devices still in DB) -> Web Push suppressed -> no FCM channel registered -> notifications go to /dev/null. Fix: extracted the probe construction into buildNativeFallbackProbe() which short-circuits to () => false when fcmConfig is missing. Probe never even consults the device store in the no-config branch, so stale rows can never matter. 2) Transient FCM failures permanently unregistered devices. sendToToken() returned a single boolean and sendToNamespace() removed any device whose send returned false. A 429 (rate limit), 503 (server error), 401 (auth glitch), or even an ECONNREFUSED would delete the device row, after which the user would need to re-pair to get notifications again. The bot caught it; the fix is the obvious one. Fix: sendToToken() now returns 'sent' | 'invalid' | 'failed'. - 'invalid' is reserved for the responses that genuinely indicate a dead token: HTTP 404 with UNREGISTERED/NOT_FOUND, and HTTP 400 with INVALID_ARGUMENT explicitly referencing the token field. - Everything else (429, 5xx, 401, 403, network errors) is 'failed' and counts toward the failed tally without removing the device. sendToNamespace() only calls removeDeviceByToken() on 'invalid'. Tests: 11 new tests across two new files. fcmService.test.ts covers all six branches (200, 404 unregistered, 429, 503, 401, network error) plus a mixed-batch case that proves invalid tokens get removed in the same call where transient-failure tokens survive. nativeFallbackProbe .test.ts covers both no-config and configured branches plus the explicit "no-config never touches the store" guarantee. Hub test count: 273 -> 284 (all passing). Co-authored-by: Cursor <cursoragent@cursor.com>

…ent type HAPI Bot review on PR tiann#803 caught two contract-doc accuracy gaps: 1) Visibility rule was wrong. Doc said "FCM fires when Web Push would fire AND client not visible via SSE", but FcmNotificationChannel ALWAYS fires regardless of PWA visibility (deliberately - native companion is the canonical wrist-first surface, and there is a passing test asserting this). Companion app implementers reading the contract would have built foreground-suppression logic and then dropped notifications when the PWA tab was open. 2) Documented `session-completed` event doesn't exist. NotificationHub never calls into a 'session-completed' channel method on FcmNotificationChannel; the type would never reach a native client. Removed from the documented enum, leaving only the three actual events: ready, permission-request, task-notification. Co-authored-by: Cursor <cursoragent@cursor.com>

…ng web-push Addresses HAPI Bot Major review on PR tiann#803. The previous health gate treated an empty outcome buffer as healthy ("innocent until proven guilty"). That created a silent-blackhole window on cold start with broken FCM credentials: the push channel suppressed SSE/Web Push for the first ~5 events while the FCM channel attempted each delivery and recorded failures, until enough stacked to flip the threshold. Every notification in that gap was silently lost. New invariant: isHealthy() requires at least one successful FCM send in the recent window (HEALTH_WINDOW=8) AND failures below threshold (HEALTH_FAILURE_THRESHOLD=5). Both conditions are necessary; either alone is insufficient evidence to safely suppress web-push fallback. Trade-off: one duplicated notification per hub restart per namespace. On the first event after restart, web-push fires alongside FCM (because the gate has no positive evidence yet). Once FCM records that first success, the gate engages and subsequent events are FCM-only. Worth it for guaranteed delivery during cold-start outages. Tests reworked to match new semantics: - "starts UNHEALTHY with empty buffer" (was: healthy) - "flips to healthy after first successful send" (new) - "stays unhealthy across failures-only run" (new, exercises the exact blackhole scenario the bot flagged) - "flips back to unhealthy after threshold breach with prior successes" (renamed, establishes successes first) - "invalid tokens don't count against health" (reworked: send a mixed batch first to establish health, then verify invalids don't flip it) - "network errors count as failures" (reworked: establish health first) Hub tests: 313 pass / 0 fail. typecheck green. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Web Push is still suppressed before knowing whether FCM delivered — NotificationHub invokes PushNotificationChannel before FcmNotificationChannel, so this new early return runs while FcmService still reflects previous sends. After any prior FCM success, isHealthy() remains true until five failures accumulate, so the first failed ready/permission/task notifications in a fresh OAuth/network/5xx outage are dropped by Web Push and then fail in FCM. Evidence: hub/src/push/pushNotificationChannel.ts:76, hub/src/fcm/fcmService.ts:95.
Suggested fix:
```
type NativeDelivery = (session: Session, payload: PushPayload) => Promise<boolean>

private async deliverWithNativeFallback(
    session: Session,
    payload: PushPayload,
    deliverNative?: NativeDelivery
): Promise<void> {
    if (deliverNative && await deliverNative(session, payload).catch(() => false)) {
        return
    }

    await this.deliverWebOrToast(session, payload)
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
One Major issue remains: the cold-start case is improved, but the suppression decision is still made from stale FCM health rather than this notification's FCM send result.

Testing

Not run (automation: bun unavailable in this runner). Static validation: git diff --check passed.

HAPI Bot

github-actions

Findings

No issues found.

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus previous HAPI Bot review context. The prior Major stale FCM/Web Push suppression issue appears addressed by the FCM-first channel ordering and per-dispatch nativeGate fallback.
Residual risk: FCM delivery and companion pairing behavior were validated statically here because Bun is unavailable in this runner.

Testing

Not run (automation: bun unavailable in this runner). Static validation: git diff --check passed.

HAPI Bot

Adds opt-in FCM HTTP v1 notification delivery so a companion mobile/wearable app can receive permission, ready, and task notifications end-to-end. The channel is gated entirely on FCM_SERVICE_ACCOUNT_PATH + FCM_PROJECT_ID being set; operators not running a companion see zero behavior change. What lands: - POST/DELETE /api/devices/register — JWT-authed FCM token registry, upsert on (namespace, deviceId, platform), platforms `phone` | `wear`. - Sqlite v9 → v10 migration adds `fcm_devices` (idx on namespace + token). - FcmService — minimal HTTP v1 client, RS256 service-account JWT via jose (dep already in tree), 5-minute access-token cache, 401 retry. - FcmNotificationChannel — implements NotificationChannel, sends data-only FCM (so companion can route to phone+watch surfaces). Body composition parses an optional trailing `AGENT_NOTIFY_SUMMARY {json}` line for richer ready summaries; truncates plain assistant text to 280 chars otherwise. Tags each payload with `severity` (info/warning/success/error) so clients can color/categorise the notification. - PushNotificationChannel gains a NativeFallbackProbe — when a namespace has at least one registered FCM device, web-push and SSE in-page toast are skipped so the operator does not double-notify on phone+browser. Probe is no-op when no FCM device is registered; PWA-only setups unchanged. Branch trace gated on HAPI_NOTIFY_DEBUG=1. - shared/src/messages.ts — `extractAssistantPlainText` (codex + Claude SDK shapes) and `extractNotifySummary` (strict end-anchored line parser). - hub/src/notifications/toolArgs.ts — tool-arg formatters lifted out of telegram/sessionView (kept duplicated there in this PR; refactor of Telegram is a follow-up). - docs/api/native-companion-contract.md — payload + endpoints + env vars, versioned at contract v1. Test coverage: - 260 hub tests pass (incl. 23 new across FCM channel, push dedup, v10 migration, devices route). - 60 shared tests pass (messages parsers). Notes for reviewers: - Reference companion implementation lives in a separate Android repo (Kotlin, phone APK + Wear OS APK) — this PR is hub-side only. - No new runtime deps (`jose` and `zod` already declared in hub). Co-authored-by: Cursor <cursoragent@cursor.com>

…ub-on-phone Adds a Scope section to the native-companion contract so anyone implementing it knows the audience: operators running the hub on a server who want phone/watch as a notification surface, not users expecting a Termux-bundled hub. Mirrors the framing now in heavygee/hapi-companion README. Co-authored-by: Cursor <cursoragent@cursor.com>

Removes the prior framing that referenced a non-existent 'Termux hub-on-phone' alternative. This contract describes a native client to the same hub the PWA talks to; it does not change where the hub runs. Co-authored-by: Cursor <cursoragent@cursor.com>

Companion section in Settings renders a QR code encoding the deeplink hapicompanion://bind?hub=<base>&code=<token>. Scanning it from the HAPI companion app (Android phone or Wear OS) auto-fills the bind form and authenticates against this hub - no manual URL/token paste. QR is gated behind a Show button so the access token doesn't sit visible on screen by default; a Copy link affordance and the textual deeplink are also exposed for manual onboarding. Adds qrcode + @types/qrcode to web/ (already a hub dep, no new resolved package - just a workspace declaration). Co-authored-by: Cursor <cursoragent@cursor.com>

After the existing PWA access QR is rendered on tunnel start, also print the hapicompanion://bind?hub=...&code=... deeplink and a matching QR. Same tunnel + token, different scheme: phones with the companion app installed pick up the deeplink via the manifest intent filter; phones without it ignore it and fall back to the PWA QR above. QR rendering failure is non-fatal in both cases - the textual deeplink above the QR is sufficient for manual paste. Co-authored-by: Cursor <cursoragent@cursor.com>

Two bugs surfaced by the upstream review bot: 1) Web Push silently dropped when FCM is not actually configured. The native-fallback probe only checked the device registry; it did not check whether resolveFcmConfig() actually succeeded. So an operator who previously enabled FCM, registered a phone, then later started the hub WITHOUT FCM_SERVICE_ACCOUNT_PATH would see the probe return true (devices still in DB) -> Web Push suppressed -> no FCM channel registered -> notifications go to /dev/null. Fix: extracted the probe construction into buildNativeFallbackProbe() which short-circuits to () => false when fcmConfig is missing. Probe never even consults the device store in the no-config branch, so stale rows can never matter. 2) Transient FCM failures permanently unregistered devices. sendToToken() returned a single boolean and sendToNamespace() removed any device whose send returned false. A 429 (rate limit), 503 (server error), 401 (auth glitch), or even an ECONNREFUSED would delete the device row, after which the user would need to re-pair to get notifications again. The bot caught it; the fix is the obvious one. Fix: sendToToken() now returns 'sent' | 'invalid' | 'failed'. - 'invalid' is reserved for the responses that genuinely indicate a dead token: HTTP 404 with UNREGISTERED/NOT_FOUND, and HTTP 400 with INVALID_ARGUMENT explicitly referencing the token field. - Everything else (429, 5xx, 401, 403, network errors) is 'failed' and counts toward the failed tally without removing the device. sendToNamespace() only calls removeDeviceByToken() on 'invalid'. Tests: 11 new tests across two new files. fcmService.test.ts covers all six branches (200, 404 unregistered, 429, 503, 401, network error) plus a mixed-batch case that proves invalid tokens get removed in the same call where transient-failure tokens survive. nativeFallbackProbe .test.ts covers both no-config and configured branches plus the explicit "no-config never touches the store" guarantee. Hub test count: 273 -> 284 (all passing). Co-authored-by: Cursor <cursoragent@cursor.com>

…ent type HAPI Bot review on PR tiann#803 caught two contract-doc accuracy gaps: 1) Visibility rule was wrong. Doc said "FCM fires when Web Push would fire AND client not visible via SSE", but FcmNotificationChannel ALWAYS fires regardless of PWA visibility (deliberately - native companion is the canonical wrist-first surface, and there is a passing test asserting this). Companion app implementers reading the contract would have built foreground-suppression logic and then dropped notifications when the PWA tab was open. 2) Documented `session-completed` event doesn't exist. NotificationHub never calls into a 'session-completed' channel method on FcmNotificationChannel; the type would never reach a native client. Removed from the documented enum, leaving only the three actual events: ready, permission-request, task-notification. Co-authored-by: Cursor <cursoragent@cursor.com>

…h break Co-authored-by: Cursor <cursoragent@cursor.com>

…works The Settings -> Companion pairing QR reads the original CLI access token from localStorage (hapi_access_token::<baseUrl>) so it can be encoded into the hapicompanion://bind deeplink. For browser/CLI logins useAuthSource already persists the token via setAccessToken, but the Telegram Mini App bind path went through useAuth.bind() which exchanged the typed CLI token for a JWT and never persisted it. Telegram users therefore always saw the "signed in via Telegram..." fallback and got no usable QR. After a successful client.bind() we now mirror useAuthSource's behavior and write the same accessToken to the same localStorage key, restoring parity between the two auth paths. No change for browser/CLI users. Co-authored-by: Cursor <cursoragent@cursor.com>

The native-fallback probe previously returned true whenever FCM was configured AND devices were registered, which suppressed web-push for the namespace. The HAPI Bot correctly pointed out the gap: if the FCM pipeline silently breaks (expired service-account key, sustained 5xx, OAuth token-fetch failure, network blackhole) the operator gets nothing on either channel until they manually intervene. Approach (deliberate, not the bot's exact suggested fix): - FcmService now keeps a small rolling window (last 8 outcomes) of send attempts and exposes `isHealthy()`. The threshold is 5+/8 failures = unhealthy; the buffer starts empty so a freshly-booted hub is optimistic ("innocent until proven guilty") and does not double-fire on event #1. - Token-fetch failure (`getFcmAccessToken` throws) now records exactly one health-failure (not one per device), short-circuits the send loop, and returns a result so `sendToNamespace` no longer leaks the exception. - `invalid` token responses are explicitly excluded from the health buffer because they are per-device facts (rotated/uninstalled token), not pipeline failures - FCM was reachable, it just rejected one stale token. - `buildNativeFallbackProbe` now optionally accepts the FcmService and short-circuits to "let web-push fire" when health is bad, before it even queries the device registry. The single-arg call shape is still supported for back-compat. Why not the bot's exact suggestion ("invert: call FCM first, fall back on result.sent === 0"): - Couples PushNotificationChannel to FcmService and FcmSendPayload, reversing the clean parallel-channel architecture established earlier in this PR. - Treats every transient single-event failure as fallback-worthy, which re-opens the duplicate-notification race that the suppression logic was added to close (FCM HTTP timeout that delivers later + the web push we sent in the meantime = two pings). - A rolling health window only flips on sustained breakage, which is the actual operational scenario the bot is worried about. The wrist-first design intent ("FCM fires unconditionally, web-push is suppressed for the same namespace") documented in docs/api/native-companion-contract.md is preserved on the happy path. The probe only re-enables web-push when there is concrete evidence the native pipeline is not delivering. Tests: - New FcmService.isHealthy suite covers empty-buffer, threshold flip, recovery as failures age out of the window, invalid-token exclusion, and network-error path. - nativeFallbackProbe gains coverage for the unhealthy-but-registered, healthy-and-registered, and absent-fcmService (back-compat) cases. - All 292 hub tests still pass; typecheck clean. Co-authored-by: Cursor <cursoragent@cursor.com>

…dule The Telegram session view had its own copy of formatToolArgumentsDetailed identical to the one in hub/src/notifications/toolArgs.ts (already used by the FCM channel). Replace the local copy with an import. Removes ~70 lines of duplication, plus the now-unused MAX_TOOL_ARGS_LENGTH constant and `truncate` import. The shared signature accepts an optional opts arg whose default maxArgLength is 150 - matching the prior constant - so the call site is unchanged. Two benign upgrades come along for the ride from the shared module: ?? instead of || on field fallbacks (no real-world difference; permission arguments never carry empty-string fields), and String(...) wrapping plus a typeof object guard that makes non-string values render gracefully instead of throwing into the catch block. Hub tests: 311 pass / 0 fail. Telegram subset: 5 pass / 0 fail. typecheck green. Cold-reviewed by an out-of-context Claude Opus peer before push. Co-authored-by: Cursor <cursoragent@cursor.com>

…ng web-push Addresses HAPI Bot Major review on PR tiann#803. The previous health gate treated an empty outcome buffer as healthy ("innocent until proven guilty"). That created a silent-blackhole window on cold start with broken FCM credentials: the push channel suppressed SSE/Web Push for the first ~5 events while the FCM channel attempted each delivery and recorded failures, until enough stacked to flip the threshold. Every notification in that gap was silently lost. New invariant: isHealthy() requires at least one successful FCM send in the recent window (HEALTH_WINDOW=8) AND failures below threshold (HEALTH_FAILURE_THRESHOLD=5). Both conditions are necessary; either alone is insufficient evidence to safely suppress web-push fallback. Trade-off: one duplicated notification per hub restart per namespace. On the first event after restart, web-push fires alongside FCM (because the gate has no positive evidence yet). Once FCM records that first success, the gate engages and subsequent events are FCM-only. Worth it for guaranteed delivery during cold-start outages. Tests reworked to match new semantics: - "starts UNHEALTHY with empty buffer" (was: healthy) - "flips to healthy after first successful send" (new) - "stays unhealthy across failures-only run" (new, exercises the exact blackhole scenario the bot flagged) - "flips back to unhealthy after threshold breach with prior successes" (renamed, establishes successes first) - "invalid tokens don't count against health" (reworked: send a mixed batch first to establish health, then verify invalids don't flip it) - "network errors count as failures" (reworked: establish health first) Hub tests: 313 pass / 0 fail. typecheck green. Co-authored-by: Cursor <cursoragent@cursor.com>

…9→V10 Upstream/main landed sessions.service_tier at schema v10. The companion FCM device registry now migrates at v11 so both changes compose cleanly after the courtesy rebase onto current upstream/main. Co-authored-by: Cursor <cursoragent@cursor.com>

FCM runs before web-push; PushNotificationChannel skips web/SSE only when the same notify() dispatch already delivered via FCM. Removes the isHealthy()+device-row probe that could suppress web-push after warm FCM outages. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Cap AGENT_NOTIFY_SUMMARY before putting it into FCM data — the normal ready fallback is capped at 280 chars, but the structured-summary branch copies arbitrary summary/action text into both body and data.notifySummary. Because FcmService sends those values as FCM data strings, a long agent-emitted summary can push the request over FCM data-size limits and make native companion delivery fail for that ready event. Evidence: hub/src/fcm/fcmNotificationChannel.ts:146, hub/src/fcm/fcmService.ts:167.
Suggested fix:

const summaryLine = truncateReadyText(summary.summary, READY_BODY_GLANCE_LIMIT)
const actionLine = summary.action && summary.action !== summary.summary
    ? truncateReadyText(`-> ${summary.action}`, READY_BODY_GLANCE_LIMIT - summaryLine.length - 1)
    : ''
const body = [summaryLine, actionLine].filter(Boolean).join('\n')
return {
    title: headerTitle,
    body,
    notifySummary: { ...summary, summary: summaryLine, action: actionLine.replace(/^-> /, '') }
}

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus previous HAPI Bot review context. One native-notification reliability issue remains around unbounded structured ready summaries.

Testing

Not run (automation: bun unavailable in this runner). Static validation: git diff --check passed.

HAPI Bot

Rebase follow-up: truncate AGENT_NOTIFY_SUMMARY summary/action before FCM data payload (bot Major). Fix usePwaUpdate.test.ts setTimeout mock cast so bun typecheck passes on current main. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Keep notifySummary under FCM data limits — summary and action are now capped, but the object still spreads the rest of the parsed AGENT_NOTIFY_SUMMARY fields into data.notifySummary. extractNotifySummary accepts agent, project, and status as arbitrary strings, so a long value in any of those fields can still make the FCM data payload exceed Firebase's size limit and drop the ready notification. Evidence: hub/src/fcm/fcmNotificationChannel.ts:158, related parser fields at shared/src/messages.ts:165.
Suggested fix:

const notifySummary = {
    ...(typeof summary.version === 'number' ? { version: summary.version } : {}),
    summary: summaryLine,
    ...(summary.action ? { action: this.truncateReadyText(summary.action, READY_BODY_GLANCE_LIMIT) } : {}),
    ...(summary.status ? { status: this.truncateReadyText(summary.status, 32) } : {}),
    ...(summary.agent ? { agent: this.truncateReadyText(summary.agent, 80) } : {}),
    ...(summary.project ? { project: this.truncateReadyText(summary.project, 80) } : {})
}

[Major] Cap task-notification summaries before sending them through FCM — the new FCM task path copies notification.summary directly into body, and FcmService sends that body as data.body. Task summaries are parsed from system output / XML text with only .trim(), so a long background-task summary can exceed FCM data limits and prevent native delivery for task notifications. Evidence: hub/src/fcm/fcmNotificationChannel.ts:241, related parsing at hub/src/notifications/eventParsing.ts:57.
Suggested fix:
```
const taskSummary = this.truncateReadyText(notification.summary, READY_BODY_GLANCE_LIMIT)
const payload = this.buildPayload({
    title: isFailure ? 'Task failed' : 'Task completed',
    body: `${agentName} · ${name} · ${taskSummary}`,
    type: 'task-notification',
    sessionId: session.id,
    sessionName: name,
    url: path,
    severity: isFailure ? 'error' : 'success'
})
```

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus prior HAPI Bot context. The earlier ready summary/action cap was partly addressed, but FCM still has two unbounded data-payload paths that can break native notification delivery.

Testing

Not run (bun unavailable in this runner). Static validation: git diff --check passed.

HAPI Bot

Whitelist and truncate AGENT_NOTIFY_SUMMARY auxiliary fields before JSON serialization; cap task-notification summaries to glance limit. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Bound FCM network calls so web-push fallback can run — NotificationHub calls channels sequentially with FCM registered before PushNotificationChannel, but both the OAuth token exchange and the FCM send use bare fetch() calls with no timeout. If either Google endpoint stalls, sendToNamespace() never returns, nativeGate never stays false for the push channel, and existing Web Push/SSE fallback is blocked for that notification. Evidence hub/src/fcm/fcmService.ts:199, related channel ordering hub/src/startHub.ts:216.
Suggested fix:
```
const FCM_REQUEST_TIMEOUT_MS = 10_000

response = await fetch(url, {
    method: 'POST',
    headers: {
        authorization: `Bearer ${accessToken}`,
        'content-type': 'application/json'
    },
    body: JSON.stringify({ message }),
    signal: AbortSignal.timeout(FCM_REQUEST_TIMEOUT_MS)
})
```
Also apply the same timeout to the OAuth fetch in getFcmAccessToken() so token acquisition cannot block fallback delivery.
[Major] Truncate Grep/Glob permission details before putting them in FCM data — most tool arguments are capped, but the new shared formatter copies pattern directly for Grep/Glob. FcmNotificationChannel includes that detailed string in data.body, so a long pattern can exceed FCM data limits and drop the native permission notification. Evidence hub/src/notifications/toolArgs.ts:80, consumed by hub/src/fcm/fcmNotificationChannel.ts:55.
Suggested fix:
```
case 'Grep':
case 'Glob': {
    const pattern = (a.pattern as string | undefined) ?? ''
    const path = (a.path as string | undefined) ?? ''
    let result = `Pattern: ${truncate(pattern, maxLen)}`
    if (path) result += `\nPath: ${truncate(path, 80)}`
    return result
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus prior HAPI Bot context. The prior payload-limit findings were addressed, but the FCM path still has a fallback-blocking network wait and one unbounded permission payload field.

Testing

Not run (bun unavailable in this runner). Static validation: git diff --check origin/main...HEAD passed.

HAPI Bot

10s AbortSignal.timeout on OAuth + FCM send so sequential web-push fallback is not blocked on hung Google endpoints; truncate Grep/Glob pattern in permission detail formatter. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Keep each FCM token bound to only one namespace — POST /api/devices/register writes whatever namespace is in the current JWT, but the store only conflicts on (namespace, device_id, platform). If the same native install re-pairs from namespace A to namespace B, the old namespace A row remains with the same FCM token, so future A notifications are still delivered to the device now bound to B. That breaks HAPI's namespace isolation and can leak session/permission details across users/namespaces. Evidence hub/src/store/fcmDevices.ts:39, route caller hub/src/web/routes/devices.ts:27.
Suggested fix:

const params = {
    namespace,
    token: device.token,
    platform: device.platform,
    device_id: device.deviceId,
    created_at: now,
    updated_at: now
}

db.transaction(() => {
    db.prepare(`
        DELETE FROM fcm_devices
        WHERE token = @token
          AND (namespace != @namespace OR device_id != @device_id OR platform != @platform)
    `).run(params)

    db.prepare(`
        INSERT INTO fcm_devices (
            namespace, token, platform, device_id, created_at, updated_at
        ) VALUES (
            @namespace, @token, @platform, @device_id, @created_at, @updated_at
        )
        ON CONFLICT(namespace, device_id, platform)
        DO UPDATE SET
            token = excluded.token,
            updated_at = excluded.updated_at
    `).run(params)
})()

Add a route/store regression test that registers the same token in namespace A, then namespace B, and asserts A has no device rows left.

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus prior HAPI Bot context. The prior timeout and Grep/Glob payload findings are addressed. One remaining issue: FCM token re-pairing can leave a stale device row in the old namespace.

Testing

Not run (bun unavailable in this runner). Static validation: git diff --check origin/main...HEAD passed.

HAPI Bot

Delete stale fcm_devices rows sharing the same token when a native install registers under a different namespace. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Minor] Localize the new Companion settings copy — the Settings page is locale-driven (useTranslation() plus en/zh-CN dictionaries), but this added section renders Companion directly and the new CompanionPairing component renders all user-facing copy as English literals. Users who selected Chinese will get a mixed-language settings screen. Evidence web/src/routes/settings/index.tsx:1264, related copy in web/src/components/settings/CompanionPairing.tsx:74.
Suggested fix:
```
<CompanionPairing baseUrl={baseUrl} t={t} />
```
```
type CompanionPairingProps = {
    baseUrl: string
    t: (key: string) => string
}

{t('settings.companion.title')}
{t('settings.companion.showQr')}
```
Add matching settings.companion.* entries to both web/src/lib/locales/en.ts and web/src/lib/locales/zh-CN.ts.

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus prior HAPI Bot context. The previous namespace isolation finding is addressed. One minor Settings localization regression remains.

Testing

Not run (automation/static review only; did not execute PR code).

HAPI Bot

heavygee · 2026-06-20T16:24:22Z

@tiann Status bump on #803 (16 days open):

Rebased onto current main today — head ab168037
CI: test green on latest push; bot re-review in flight
Bot: all [Major] threads from today's rebase addressed and resolved (FCM payload caps, fetch timeouts, token/namespace re-pair isolation, nativeGate). One [Minor] i18n nit left on Companion settings copy — can follow up in a tiny commit if you want it pre-merge
Design: the June probe-vs-invert question is closed — FCM-first + per-dispatch gate, no open architectural choices

This is the hub surface for the native companion we discussed. My dogfood stack is blocked until it lands upstream. Ready for review/merge when you have bandwidth — companion app is at heavygee/hapi-companion; happy to transfer under tiann if you prefer.

Thanks.

Add en/zh-CN keys for the Companion section title and CompanionPairing strings; matches locale-driven Settings pattern (bot Minor on tiann#803). Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Do not treat generic FCM NOT_FOUND as a dead device token — a 404/NOT_FOUND can also come from a bad project/resource configuration, and this branch deletes every registration for that namespace. Evidence hub/src/fcm/fcmService.ts:228.
Suggested fix:

const parsedError = (() => {
    try {
        return JSON.parse(body) as { error?: { status?: string; details?: Array<{ fieldViolations?: Array<{ field?: string }> }> } }
    } catch {
        return null
    }
})()
const status = parsedError?.error?.status ?? ''
const tokenFieldViolation = parsedError?.error?.details?.some((detail) =>
    detail.fieldViolations?.some((violation) => /message\.token|token/i.test(violation.field ?? ''))
) ?? false
const isUnregistered = response.status === 404 && status === 'UNREGISTERED'
const isMalformedToken = response.status === 400 && status === 'INVALID_ARGUMENT' && tokenFieldViolation

[Major] Guard tiny truncation limits before slicing — when the summary consumes 278 or 279 chars, the remaining action budget is 1 or 2, so slice(0, limit - 3) uses a negative end index and returns almost the full action plus .... That defeats the payload/body cap and can make FCM reject large AGENT_NOTIFY_SUMMARY actions. Evidence hub/src/fcm/fcmNotificationChannel.ts:149.
Suggested fix:
```
private truncateReadyText(text: string, limit: number): string {
    const trimmed = text.trim()
    if (limit <= 0 || trimmed.length === 0) return ''
    if (trimmed.length <= limit) return trimmed
    if (limit <= 3) return '.'.repeat(limit)
    return trimmed.slice(0, limit - 3).trimEnd() + '...'
}
```

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus prior HAPI Bot context. The previous localization finding is addressed. Two FCM edge cases remain: generic NOT_FOUND can unregister live devices, and tiny truncation budgets can exceed the intended notification payload cap.

Testing

Not run (automation/static review only; did not execute PR code).

HAPI Bot

Parse FCM error JSON: only UNREGISTERED or token-field INVALID_ARGUMENT unregister devices; generic NOT_FOUND stays transient. Guard limit<=3 in truncateReadyText so tiny action budgets cannot blow the glance cap. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

[Major] Parse FCM-specific details[].errorCode before deciding token liveness — why it matters: FCM v1 commonly reports expired/unregistered tokens as HTTP 404 with root status: "NOT_FOUND" and details[].errorCode: "UNREGISTERED". The new parser only checks the root status for UNREGISTERED, so those known-dead tokens stay in fcm_devices, get retried on every notification, and are counted as transient pipeline failures instead of being pruned. Evidence hub/src/fcm/fcmService.ts:258.
Suggested fix:

type FcmErrorDetail = {
    '@type'?: string
    errorCode?: string
    fieldViolations?: Array<{ field?: string }>
}

const details = parsedError?.error?.details ?? []
const fcmErrorCode = details.find((detail: FcmErrorDetail) =>
    detail['@type'] === 'type.googleapis.com/google.firebase.fcm.v1.FcmError'
)?.errorCode ?? ''
const tokenFieldViolation = details.some((detail: FcmErrorDetail) =>
    detail.fieldViolations?.some((violation) => /message\.token|token/i.test(violation.field ?? ''))
) ?? false

const isUnregistered = status === 404 && fcmErrorCode === 'UNREGISTERED'
const isMalformedToken = status === 400
    && errorStatus === 'INVALID_ARGUMENT'
    && (fcmErrorCode === 'INVALID_ARGUMENT' || tokenFieldViolation)

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus prior HAPI Bot context. The previous truncation issue is addressed. One FCM error-shape gap remains: generic NOT_FOUND is no longer over-pruned, but real unregistered-token responses that carry the FCM code in details[] are now under-pruned.

Testing

Not run (automation/static review only; did not execute PR code). Suggested coverage: add FCM response cases for 404 NOT_FOUND + FcmError.UNREGISTERED and 400 INVALID_ARGUMENT + FcmError.INVALID_ARGUMENT, while keeping the existing BadRequest payload-error case non-pruning.

HAPI Bot

FCM v1 often returns HTTP 404 with root NOT_FOUND plus details[].errorCode UNREGISTERED; prune those tokens while keeping generic project/resource NOT_FOUND transient. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions

Findings

None.

Questions

None.

Summary

Review mode: follow-up after new commits
Reviewed the full current diff plus prior HAPI Bot context. The prior FCM token-liveness finding is addressed in the current head: the parser now checks details[].errorCode for UNREGISTERED / INVALID_ARGUMENT and has regression coverage.
Residual risk: this is a static/automation review only; I did not exercise real FCM delivery, Firebase credential failures, or the native companion app end to end.

Testing

Not run (automation/static review only). Existing PR coverage includes FCM send/error parsing, native/web-push gating, device routes, migration, and message summary helpers.

HAPI Bot

github-actions Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread hub/src/startHub.ts Outdated

heavygee force-pushed the feat/companion-fcm-push-api branch from 9179ca9 to ae33e74 Compare June 4, 2026 18:39

github-actions Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread hub/src/startHub.ts Outdated

Comment thread hub/src/fcm/fcmService.ts Outdated

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread docs/api/native-companion-contract.md Outdated

Comment thread docs/api/native-companion-contract.md Outdated

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread docs/api/native-companion-contract.md Outdated

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread web/src/components/settings/CompanionPairing.tsx

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread hub/src/push/pushNotificationChannel.ts Outdated

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread hub/src/push/pushNotificationChannel.ts Outdated

github-actions Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread hub/src/push/pushNotificationChannel.ts Outdated

github-actions Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread hub/src/push/pushNotificationChannel.ts Outdated

github-actions Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread hub/src/push/pushNotificationChannel.ts Outdated

heavygee force-pushed the feat/companion-fcm-push-api branch from 8010935 to fc9b6dd Compare June 17, 2026 15:18

github-actions Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread hub/src/push/pushNotificationChannel.ts Outdated

github-actions Bot reviewed Jun 17, 2026

View reviewed changes

heavygee and others added 5 commits June 20, 2026 16:15

heavygee and others added 9 commits June 20, 2026 16:15

docs(contract): drop trailing whitespace, use blank line for paragrap…

16bafb9

…h break Co-authored-by: Cursor <cursoragent@cursor.com>

heavygee force-pushed the feat/companion-fcm-push-api branch from 915d5f2 to 8f87051 Compare June 20, 2026 15:16