Skip to content

GRA-1144: add opt-in SDK WAU telemetry#251

Open
Gradata wants to merge 1 commit into
mainfrom
gra-1144-wau-telemetry
Open

GRA-1144: add opt-in SDK WAU telemetry#251
Gradata wants to merge 1 commit into
mainfrom
gra-1144-wau-telemetry

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Jun 3, 2026

Implements the SDK/CLI side of GRA-1144 WAU telemetry.

  • sends anonymous opt-in wau_ping on agent session start
  • keeps telemetry off by default; existing config prompt/kill-switch still applies
  • adds gradata telemetry wau readback command
  • documents opt-in and privacy disclosure in README

Verification:

  • python3 -m pytest tests/test_telemetry.py -q → 39 passed

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

📝 Walkthrough
  • WAU Telemetry Implementation: Adds opt-in weekly active user (wau_ping) telemetry that sends anonymous usage data when an agent session starts; disabled by default
  • New CLI Command: gradata telemetry wau provides a readback command to fetch and display live WAU metrics
  • Session Hook Integration: Session boot hook automatically triggers send_session_ping() to record WAU events at session start with graceful error handling (debug logs only)
  • New Public API Methods:
    • send_session_ping(*, blocking: bool = False) – sends opt-in WAU ping (best-effort, background thread by default)
    • fetch_wau(timeout: float = 3.0) – fetches WAU aggregate data from telemetry endpoint
  • Telemetry Event Types: Extends telemetry system with HEARTBEAT_EVENTS constant and TelemetryEventName type covering activation and heartbeat events
  • Enhanced HTTP Routing: _post() method now accepts optional endpoint override for routing different event types to appropriate endpoints
  • Privacy Documentation: README updated with opt-in behavior, data categories sent, explicit privacy disclosures (no code, file paths, prompts, emails, names, stack traces, or raw IPs), and how to disable via GRADATA_TELEMETRY=0
  • Event Validation Update: send_event() validation extended from activation events only to all telemetry events (including wau_ping)
  • Test Coverage: 39 tests passing, including new tests for session ping posting and WAU fetch error handling

Walkthrough

This PR introduces session heartbeat telemetry alongside existing activation events. It defines heartbeat event types, extends HTTP posting to support endpoint overrides, implements session ping and WAU fetch utilities, integrates session pings at startup, adds a CLI command to view WAU metrics, and documents the telemetry privacy model.

Changes

Session Heartbeat Telemetry

Layer / File(s) Summary
Telemetry event types and core utilities
Gradata/src/gradata/_telemetry.py
Defines HEARTBEAT_EVENTS (wau_ping) and combines activation + heartbeat into TELEMETRY_EVENTS; adds TelemetryEventName type; extends _post to accept optional endpoint override; updates send_event validation for combined event set; implements _ping_endpoint, send_session_ping, and fetch_wau (with fallback error dict on failures).
Session ping initialization
Gradata/src/gradata/hooks/session_boot.py
Calls send_session_ping() at session startup, wrapped in try/except to log debug on failure without blocking session continuation.
CLI telemetry visibility command
Gradata/src/gradata/cli.py
Adds cmd_telemetry(args) handler to fetch and display WAU data; supports --json or formatted output; integrates telemetry subcommand with wau subcommand into argparse and command dispatch.
Telemetry tests
Gradata/tests/test_telemetry.py
Validates send_session_ping() posts wau_ping when enabled and is a no-op when disabled; verifies _ping_endpoint() derives correct URL; confirms fetch_wau() returns fallback error dict on unreachable endpoints.
Privacy and telemetry documentation
Gradata/README.md
Documents opt-in telemetry (default-off), consent flow, stored config path, allowed fields (event name, hashed user_id, UTC timestamp, SDK version), explicit exclusions (code, file paths, content, prompts, emails, names, stack traces, env vars, IPs), disable via GRADATA_TELEMETRY=0, and example wau_ping command.

🎯 3 (Moderate) | ⏱️ ~20 minutes

feature, docs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'GRA-1144: add opt-in SDK WAU telemetry' clearly and concisely summarizes the main change: adding WAU telemetry functionality to the SDK with opt-in behavior.
Description check ✅ Passed The pull request description is directly related to the changeset, detailing the implementation of WAU telemetry including session pings, opt-in/default-off behavior, CLI command, README documentation, and test verification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gra-1144-wau-telemetry

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.22][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/README.md`:
- Around line 152-156: Update the README paragraph that begins "It never sends
code, file paths, lesson/correction text, prompts, emails, names, stack traces,
environment variables, or raw IP addresses." to remove the absolute claim about
raw IPs and clarify that while the SDK payload does not include IP fields, the
telemetry service and network infrastructure can still observe the source IP at
transport time; keep the note about GRADATA_TELEMETRY=0 and the wau_ping
behavior intact. Reference the exact sentence starting with "It never sends ..."
when locating the text to edit and replace that clause about raw IPs with the
revised, accurate wording about transport-level visibility.

In `@Gradata/src/gradata/cli.py`:
- Around line 135-150: Add deterministic unit tests that exercise the CLI wiring
for the new telemetry command: create tests (e.g. tests/test_cli_telemetry.py)
that invoke gradata.cli.cmd_telemetry via argparse-like invocation or by calling
the function with a fake args namespace to assert behavior of cmd_telemetry;
include cases for success JSON output (args.json=True) and human-readable output
(args.json=False) using a mocked gradata._telemetry.fetch_wau to return
predictable dicts (including a dict with an "error" key to test the error/status
path), and verify printed stdout for WAU, Week start, and Status messages as
well as JSON formatting. Ensure tests avoid nondeterministic calls and
patch/monkeypatch gradata._telemetry.fetch_wau so CI is deterministic.

In `@Gradata/tests/test_telemetry.py`:
- Around line 139-146: Update the test_session_ping_posts_wau_to_ping_endpoint
test to assert that send_session_ping posts to the derived ping URL by checking
the first arg passed into the mocked _post call equals
_telemetry._ping_endpoint(); specifically, after calling
_telemetry.send_session_ping(blocking=True) and before asserting event payload,
add an assertion that post.call_args[0][0]["url"] (or the correct key used for
the request URL in the _post payload) == _telemetry._ping_endpoint() so the test
verifies send_session_ping() uses _telemetry._ping_endpoint() rather than the
generic event endpoint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 78023ecc-3aaa-47aa-a55a-81d5b227b881

📥 Commits

Reviewing files that changed from the base of the PR and between 0f1513c and af059d1.

📒 Files selected for processing (5)
  • Gradata/README.md
  • Gradata/src/gradata/_telemetry.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/hooks/session_boot.py
  • Gradata/tests/test_telemetry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: pytest (py3.12)
  • GitHub Check: pytest (py3.11)
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.11
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/hooks/session_boot.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/_telemetry.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_telemetry.py
🧠 Learnings (2)
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Use Python 3.11+ — distribute to PyPI as `gradata` under Apache-2.0 license with architecture: Local-first SQLite + JSONL event log, optional cloud sync

Applied to files:

  • Gradata/README.md
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Use `from gradata import Brain` as the public entry point — `brain.correct()` is THE entry point for the headline product promise

Applied to files:

  • Gradata/README.md

Comment thread Gradata/README.md
Comment on lines +152 to +156
It never sends code, file paths, lesson/correction text, prompts, emails, names,
stack traces, environment variables, or raw IP addresses. Set
`GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in.
For dogfood metrics, `wau_ping` fires once on each agent session start and powers
weekly active user reporting:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not claim the SDK never sends raw IP addresses.

The payload excludes IP fields, but the telemetry service and normal network infrastructure still see the request source IP at transport time. As written, this privacy disclosure is inaccurate.

Suggested rewording
-It never sends code, file paths, lesson/correction text, prompts, emails, names,
-stack traces, environment variables, or raw IP addresses. Set
+The SDK payload never includes code, file paths, lesson/correction text, prompts,
+emails, names, stack traces, environment variables, or IP address fields. Like
+any HTTPS request, the telemetry service and standard network infrastructure may
+still process the source IP at transport time. Set
 `GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
It never sends code, file paths, lesson/correction text, prompts, emails, names,
stack traces, environment variables, or raw IP addresses. Set
`GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in.
For dogfood metrics, `wau_ping` fires once on each agent session start and powers
weekly active user reporting:
The SDK payload never includes code, file paths, lesson/correction text, prompts,
emails, names, stack traces, environment variables, or IP address fields. Like
any HTTPS request, the telemetry service and standard network infrastructure may
still process the source IP at transport time. Set
`GRADATA_TELEMETRY=0` to disable telemetry for any session, even if you opted in.
For dogfood metrics, `wau_ping` fires once on each agent session start and powers
weekly active user reporting:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/README.md` around lines 152 - 156, Update the README paragraph that
begins "It never sends code, file paths, lesson/correction text, prompts,
emails, names, stack traces, environment variables, or raw IP addresses." to
remove the absolute claim about raw IPs and clarify that while the SDK payload
does not include IP fields, the telemetry service and network infrastructure can
still observe the source IP at transport time; keep the note about
GRADATA_TELEMETRY=0 and the wau_ping behavior intact. Reference the exact
sentence starting with "It never sends ..." when locating the text to edit and
replace that clause about raw IPs with the revised, accurate wording about
transport-level visibility.

Comment on lines +135 to +150
def cmd_telemetry(args):
"""Telemetry visibility commands."""
from gradata import _telemetry

if args.telemetry_cmd == "wau":
data = _telemetry.fetch_wau()
if args.json:
print(json.dumps(data, indent=2, sort_keys=True))
return
print(f"WAU: {data.get('wau', 0)}")
if data.get("week_start"):
print(f"Week start: {data['week_start']}")
if data.get("error"):
print(f"Status: {data['error']}")
return
raise SystemExit("unknown telemetry command")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add deterministic CLI coverage for gradata telemetry wau.

This introduces a new user-facing command, but the provided tests only cover _telemetry.fetch_wau(). Please add CLI-level tests for argparse wiring, --json, and the human-readable error/success output paths.

As per coding guidelines, "Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic)".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/cli.py` around lines 135 - 150, Add deterministic unit
tests that exercise the CLI wiring for the new telemetry command: create tests
(e.g. tests/test_cli_telemetry.py) that invoke gradata.cli.cmd_telemetry via
argparse-like invocation or by calling the function with a fake args namespace
to assert behavior of cmd_telemetry; include cases for success JSON output
(args.json=True) and human-readable output (args.json=False) using a mocked
gradata._telemetry.fetch_wau to return predictable dicts (including a dict with
an "error" key to test the error/status path), and verify printed stdout for
WAU, Week start, and Status messages as well as JSON formatting. Ensure tests
avoid nondeterministic calls and patch/monkeypatch gradata._telemetry.fetch_wau
so CI is deterministic.

Comment on lines +139 to +146
def test_session_ping_posts_wau_to_ping_endpoint(self, monkeypatch):
_telemetry.set_enabled(True)
monkeypatch.setenv(_telemetry.ENV_ENDPOINT, "https://api.example.com/telemetry/event")
with patch.object(_telemetry, "_post", return_value=True) as post:
_telemetry.send_session_ping(blocking=True)
post.assert_called_once()
assert post.call_args[0][0]["event"] == "wau_ping"
assert _telemetry._ping_endpoint() == "https://api.example.com/telemetry/ping"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the derived endpoint on the _post call.

This currently proves _ping_endpoint() computes the right URL, but not that send_session_ping() actually uses it. A regression that posts wau_ping to /telemetry/event would still pass here.

Suggested assertion
     with patch.object(_telemetry, "_post", return_value=True) as post:
         _telemetry.send_session_ping(blocking=True)
         post.assert_called_once()
         assert post.call_args[0][0]["event"] == "wau_ping"
+        assert post.call_args.kwargs["endpoint"] == "https://api.example.com/telemetry/ping"
     assert _telemetry._ping_endpoint() == "https://api.example.com/telemetry/ping"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/tests/test_telemetry.py` around lines 139 - 146, Update the
test_session_ping_posts_wau_to_ping_endpoint test to assert that
send_session_ping posts to the derived ping URL by checking the first arg passed
into the mocked _post call equals _telemetry._ping_endpoint(); specifically,
after calling _telemetry.send_session_ping(blocking=True) and before asserting
event payload, add an assertion that post.call_args[0][0]["url"] (or the correct
key used for the request URL in the _post payload) ==
_telemetry._ping_endpoint() so the test verifies send_session_ping() uses
_telemetry._ping_endpoint() rather than the generic event endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant