GRA-1144: add opt-in SDK WAU telemetry#251
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 Walkthrough
WalkthroughThis PR introduces session heartbeat telemetry alongside existing activation events. It defines heartbeat event types, extends HTTP posting to support endpoint overrides, implements session ping and WAU fetch utilities, integrates session pings at startup, adds a CLI command to view WAU metrics, and documents the telemetry privacy model. ChangesSession Heartbeat Telemetry
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.22.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/README.md`:
- Around line 152-156: Update the README paragraph that begins "It never sends
code, file paths, lesson/correction text, prompts, emails, names, stack traces,
environment variables, or raw IP addresses." to remove the absolute claim about
raw IPs and clarify that while the SDK payload does not include IP fields, the
telemetry service and network infrastructure can still observe the source IP at
transport time; keep the note about GRADATA_TELEMETRY=0 and the wau_ping
behavior intact. Reference the exact sentence starting with "It never sends ..."
when locating the text to edit and replace that clause about raw IPs with the
revised, accurate wording about transport-level visibility.
In `@Gradata/src/gradata/cli.py`:
- Around line 135-150: Add deterministic unit tests that exercise the CLI wiring
for the new telemetry command: create tests (e.g. tests/test_cli_telemetry.py)
that invoke gradata.cli.cmd_telemetry via argparse-like invocation or by calling
the function with a fake args namespace to assert behavior of cmd_telemetry;
include cases for success JSON output (args.json=True) and human-readable output
(args.json=False) using a mocked gradata._telemetry.fetch_wau to return
predictable dicts (including a dict with an "error" key to test the error/status
path), and verify printed stdout for WAU, Week start, and Status messages as
well as JSON formatting. Ensure tests avoid nondeterministic calls and
patch/monkeypatch gradata._telemetry.fetch_wau so CI is deterministic.
In `@Gradata/tests/test_telemetry.py`:
- Around line 139-146: Update the test_session_ping_posts_wau_to_ping_endpoint
test to assert that send_session_ping posts to the derived ping URL by checking
the first arg passed into the mocked _post call equals
_telemetry._ping_endpoint(); specifically, after calling
_telemetry.send_session_ping(blocking=True) and before asserting event payload,
add an assertion that post.call_args[0][0]["url"] (or the correct key used for
the request URL in the _post payload) == _telemetry._ping_endpoint() so the test
verifies send_session_ping() uses _telemetry._ping_endpoint() rather than the
generic event endpoint.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 78023ecc-3aaa-47aa-a55a-81d5b227b881
📒 Files selected for processing (5)
Gradata/README.mdGradata/src/gradata/_telemetry.pyGradata/src/gradata/cli.pyGradata/src/gradata/hooks/session_boot.pyGradata/tests/test_telemetry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: pytest (py3.12)
- GitHub Check: pytest (py3.11)
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest windows-latest / py3.12
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.11
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/hooks/session_boot.pyGradata/src/gradata/cli.pyGradata/src/gradata/_telemetry.py
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_telemetry.py
🧠 Learnings (2)
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Use Python 3.11+ — distribute to PyPI as `gradata` under Apache-2.0 license with architecture: Local-first SQLite + JSONL event log, optional cloud sync
Applied to files:
Gradata/README.md
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Use `from gradata import Brain` as the public entry point — `brain.correct()` is THE entry point for the headline product promise
Applied to files:
Gradata/README.md
| It never sends code, file paths, lesson/correction text, prompts, emails, names, | ||
| stack traces, environment variables, or raw IP addresses. Set | ||
| `GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in. | ||
| For dogfood metrics, `wau_ping` fires once on each agent session start and powers | ||
| weekly active user reporting: |
There was a problem hiding this comment.
Do not claim the SDK never sends raw IP addresses.
The payload excludes IP fields, but the telemetry service and normal network infrastructure still see the request source IP at transport time. As written, this privacy disclosure is inaccurate.
Suggested rewording
-It never sends code, file paths, lesson/correction text, prompts, emails, names,
-stack traces, environment variables, or raw IP addresses. Set
+The SDK payload never includes code, file paths, lesson/correction text, prompts,
+emails, names, stack traces, environment variables, or IP address fields. Like
+any HTTPS request, the telemetry service and standard network infrastructure may
+still process the source IP at transport time. Set
`GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| It never sends code, file paths, lesson/correction text, prompts, emails, names, | |
| stack traces, environment variables, or raw IP addresses. Set | |
| `GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in. | |
| For dogfood metrics, `wau_ping` fires once on each agent session start and powers | |
| weekly active user reporting: | |
| The SDK payload never includes code, file paths, lesson/correction text, prompts, | |
| emails, names, stack traces, environment variables, or IP address fields. Like | |
| any HTTPS request, the telemetry service and standard network infrastructure may | |
| still process the source IP at transport time. Set | |
| `GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in. | |
| For dogfood metrics, `wau_ping` fires once on each agent session start and powers | |
| weekly active user reporting: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/README.md` around lines 152 - 156, Update the README paragraph that
begins "It never sends code, file paths, lesson/correction text, prompts,
emails, names, stack traces, environment variables, or raw IP addresses." to
remove the absolute claim about raw IPs and clarify that while the SDK payload
does not include IP fields, the telemetry service and network infrastructure can
still observe the source IP at transport time; keep the note about
GRADATA_TELEMETRY=0 and the wau_ping behavior intact. Reference the exact
sentence starting with "It never sends ..." when locating the text to edit and
replace that clause about raw IPs with the revised, accurate wording about
transport-level visibility.
| def cmd_telemetry(args): | ||
| """Telemetry visibility commands.""" | ||
| from gradata import _telemetry | ||
|
|
||
| if args.telemetry_cmd == "wau": | ||
| data = _telemetry.fetch_wau() | ||
| if args.json: | ||
| print(json.dumps(data, indent=2, sort_keys=True)) | ||
| return | ||
| print(f"WAU: {data.get('wau', 0)}") | ||
| if data.get("week_start"): | ||
| print(f"Week start: {data['week_start']}") | ||
| if data.get("error"): | ||
| print(f"Status: {data['error']}") | ||
| return | ||
| raise SystemExit("unknown telemetry command") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add deterministic CLI coverage for gradata telemetry wau.
This introduces a new user-facing command, but the provided tests only cover _telemetry.fetch_wau(). Please add CLI-level tests for argparse wiring, --json, and the human-readable error/success output paths.
As per coding guidelines, "Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/cli.py` around lines 135 - 150, Add deterministic unit
tests that exercise the CLI wiring for the new telemetry command: create tests
(e.g. tests/test_cli_telemetry.py) that invoke gradata.cli.cmd_telemetry via
argparse-like invocation or by calling the function with a fake args namespace
to assert behavior of cmd_telemetry; include cases for success JSON output
(args.json=True) and human-readable output (args.json=False) using a mocked
gradata._telemetry.fetch_wau to return predictable dicts (including a dict with
an "error" key to test the error/status path), and verify printed stdout for
WAU, Week start, and Status messages as well as JSON formatting. Ensure tests
avoid nondeterministic calls and patch/monkeypatch gradata._telemetry.fetch_wau
so CI is deterministic.
| def test_session_ping_posts_wau_to_ping_endpoint(self, monkeypatch): | ||
| _telemetry.set_enabled(True) | ||
| monkeypatch.setenv(_telemetry.ENV_ENDPOINT, "https://api.example.com/telemetry/event") | ||
| with patch.object(_telemetry, "_post", return_value=True) as post: | ||
| _telemetry.send_session_ping(blocking=True) | ||
| post.assert_called_once() | ||
| assert post.call_args[0][0]["event"] == "wau_ping" | ||
| assert _telemetry._ping_endpoint() == "https://api.example.com/telemetry/ping" |
There was a problem hiding this comment.
Assert the derived endpoint on the _post call.
This currently proves _ping_endpoint() computes the right URL, but not that send_session_ping() actually uses it. A regression that posts wau_ping to /telemetry/event would still pass here.
Suggested assertion
with patch.object(_telemetry, "_post", return_value=True) as post:
_telemetry.send_session_ping(blocking=True)
post.assert_called_once()
assert post.call_args[0][0]["event"] == "wau_ping"
+ assert post.call_args.kwargs["endpoint"] == "https://api.example.com/telemetry/ping"
assert _telemetry._ping_endpoint() == "https://api.example.com/telemetry/ping"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/tests/test_telemetry.py` around lines 139 - 146, Update the
test_session_ping_posts_wau_to_ping_endpoint test to assert that
send_session_ping posts to the derived ping URL by checking the first arg passed
into the mocked _post call equals _telemetry._ping_endpoint(); specifically,
after calling _telemetry.send_session_ping(blocking=True) and before asserting
event payload, add an assertion that post.call_args[0][0]["url"] (or the correct
key used for the request URL in the _post payload) ==
_telemetry._ping_endpoint() so the test verifies send_session_ping() uses
_telemetry._ping_endpoint() rather than the generic event endpoint.
Implements the SDK/CLI side of GRA-1144 WAU telemetry.
wau_pingon agent session startgradata telemetry waureadback commandVerification:
python3 -m pytest tests/test_telemetry.py -q→ 39 passed