|
| 1 | +# Specification: Watchdog Process Architecture |
| 2 | + |
| 3 | +This document specifies the architecture for isolating telemetry into a separate "Sidecar" process (Watchdog) to ensure reliable event transmission during identifying host handling (e.g. `kill -9`). |
| 4 | + |
| 5 | +## Architecture |
| 6 | + |
| 7 | +The system consists of two processes: |
| 8 | +1. **Main Process (Client)**: The DevTools MCP server. Spawns the Watchdog and sends raw telemetry events via IPC (Stdin). |
| 9 | +2. **Watchdog Process (Server)**: A simplified Node.js process. Manages Session IDs, enriches events, and handles the reliable transmission of `server_shutdown` upon detecting Main Process death. |
| 10 | + |
| 11 | +## Phase 1: Shared Core & Types (src/telemetry/types.ts) |
| 12 | + |
| 13 | +Define the IPC protocol between Master and Watchdog. |
| 14 | + |
| 15 | +* **Modify** `src/telemetry/types.ts`: |
| 16 | + * `IpcMessageType` enum: `DATA = 'data'`. |
| 17 | + * `TelemetryEvent` interface: `{ type: IpcMessageType, payload: ChromeDevToolsMcpExtension }`. |
| 18 | + * Ensure `ChromeDevToolsMcpExtension` (Protobuf mapping) is exported. |
| 19 | + * Add `ServerShutdown` interface: `export interface ServerShutdown {}`. |
| 20 | + * Update `ChromeDevToolsMcpExtension` to include `server_shutdown?: ServerShutdown`. |
| 21 | + |
| 22 | +## Phase 2: Watchdog Process (The Sidecar) |
| 23 | + |
| 24 | +Location: `src/telemetry/watchdog/` |
| 25 | + |
| 26 | +### 2.1 Watchdog Sender (`src/telemetry/watchdog/clearcut-sender.ts`) |
| 27 | + |
| 28 | +A stateful class responsible for session management and transport. |
| 29 | + |
| 30 | +**State:** |
| 31 | +* `#appVersion`: string |
| 32 | +* `#osType`: OsType |
| 33 | +* `#sessionId`: string (Hex string) |
| 34 | +* `#sessionCreated`: number (Timestamp) |
| 35 | + |
| 36 | +**Behavior:** |
| 37 | +* **Initialization**: |
| 38 | + * Generate initial `#sessionId` using `crypto.randomUUID()`. |
| 39 | + * Set `#sessionCreated = Date.now()`. |
| 40 | +* **`send(event: ChromeDevToolsMcpExtension)`**: |
| 41 | + 1. **Session Rotation**: |
| 42 | + * Check `Date.now() - #sessionCreated > 24 * 60 * 60 * 1000` (24 hours). |
| 43 | + * If expired: Generate new `#sessionId` (UUID), reset `#sessionCreated`. |
| 44 | + 2. **Enrichment**: deeply clone `event`. Populate: |
| 45 | + * `client_info.client_type` |
| 46 | + * `session_id` (from state) |
| 47 | + * `app_version` (from state) |
| 48 | + * `os_type` (from state) |
| 49 | + 3. **Transport**: |
| 50 | + * Log the JSON stringified event using the `logger` from `../../logger.js`. |
| 51 | +* **`sendShutdownEvent()`**: |
| 52 | + * Construct `server_shutdown` event. |
| 53 | + * **CRITICAL**: Do *not* include `flag_usage` (Main process might not have sent it if crashed). |
| 54 | + * Calls `this.send(shutdownEvent)`. |
| 55 | + |
| 56 | +### 2.2 Watchdog Entry Point (`src/telemetry/watchdog/main.ts`) |
| 57 | + |
| 58 | +The executable script. |
| 59 | + |
| 60 | +**Inputs (CLI Args):** |
| 61 | +* `--parent-pid`: PID of the Main Process to monitor. |
| 62 | +* `--app-version`: String. |
| 63 | +* `--os-type`: Integer (OsType). |
| 64 | +* `--log-file`: Absolute path for debugging logs (optional). |
| 65 | + |
| 66 | +**Logic Flow:** |
| 67 | +1. **Bootstrap**: |
| 68 | + * **Arg Parsing**: Use `yargs` to parse command line arguments widely. |
| 69 | + * Setup internal logging (to file if `--log-file` provided, otherwise silent/stderr). |
| 70 | + * Instantiate `ClearcutSender`. |
| 71 | +2. **IPC Handling (Stdin)**: |
| 72 | + * `process.stdin.resume()` and set encoding `utf8`. |
| 73 | + * Stream parser (using `readline` or buffer splitting) for newline-delimited JSON. |
| 74 | + * On valid message: `sender.send(payload)`. |
| 75 | +3. **Death Detection**: |
| 76 | + * **Mechanism**: Listen for `process.stdin.on('end')` and `('close')`. |
| 77 | + * **Trigger**: On either event: |
| 78 | + 1. Log "Parent death detected". |
| 79 | + 2. `await sender.sendShutdownEvent()`. |
| 80 | + 3. `process.exit(0)`. |
| 81 | +4. **Disconnect**: Handle `process.on('disconnect')` similarly to death. |
| 82 | + |
| 83 | +## Phase 3: Main Process Client (`src/telemetry/watchdog-client.ts`) |
| 84 | + |
| 85 | +Abstracts the sidecar management. |
| 86 | + |
| 87 | +**Class: `WatchdogClient`** |
| 88 | +* **Constructor**: |
| 89 | + * `parentPid`, `appVersion`, `osType`, `logFile`. |
| 90 | +* **Spawning Logic**: |
| 91 | + * **Path Resolution**: Use `import.meta.url` to resolve the absolute path to `watchdog/main.js`. |
| 92 | + ```javascript |
| 93 | + const watchdogPath = fileURLToPath(new URL('./watchdog/main.js', import.meta.url)); |
| 94 | + ``` |
| 95 | + * **Spawn**: `child_process.spawn(process.execPath, [watchdogPath, ...dependencies])`. |
| 96 | + * **Stdio Config**: `['pipe', 'ignore', 'ignore']` (Strictly ignore output, rely on `--log-file`). |
| 97 | + * **Lifecycle**: `unref()` the child so it doesn't block Main Process exit. |
| 98 | +* **`send(message: TelemetryEvent)`**: |
| 99 | + * JSON-stringify `event`. |
| 100 | + * Append `\n`. |
| 101 | + * Write to `child.stdin`. |
| 102 | + * Handle errors (EPIPE) gracefully (log warning, do not crash). |
| 103 | +
|
| 104 | +## Phase 4: Integration & Testing |
| 105 | +
|
| 106 | +### 4.1 Refactor `ClearcutLogger` |
| 107 | +* Remove direct `ClearcutSender` usage. |
| 108 | +* Instantiate `WatchdogClient` in constructor. |
| 109 | +* Proxy methods (`logToolInvocation`, etc.) to `client.send()`. |
| 110 | +
|
| 111 | +### 4.2 Verification Plan |
| 112 | +1. **Unit Tests**: |
| 113 | + * `clearcut-sender.test.ts`: Verify enrichment and session rotation logic mocking Date. |
| 114 | +2. **Integration Tests (`spawn-kill-verify`)**: |
| 115 | + * Script: Spawn Main Process (mock). |
| 116 | + * Main Process spawns Watchdog. |
| 117 | + * Main Process kills itself (`kill -9`). |
| 118 | + * **Assert**: Watchdog writes `server_shutdown` to log/transport before exiting. |
0 commit comments