feat: AGY (Antigravity) provider — with threading + full type coverage#243
feat: AGY (Antigravity) provider — with threading + full type coverage#243iansherr wants to merge 3 commits into
Conversation
- Add AgyProvider for Antigravity CLI sessions - Entry threading via parentUuid chaining (sequential DAG) - Handle all AGY entry types: USER_INPUT, PLANNER_RESPONSE, CHECKPOINT, LIST_DIRECTORY, GENERIC, RUN_COMMAND, VIEW_FILE, CODE_ACTION - Register in ProviderRegistry for auto-discovery - Pyright: 0 errors (strict mode) - All 2207 tests pass
|
Note: this PR builds on #242 (base abstraction + Claude adapter). It's meant as an add-on — the base abstraction should land first, then this adds AGY as the second fully-realized provider on top of it. The diff is against |
📝 WalkthroughWalkthroughAdds shared provider contracts, Claude and Agy session providers, a registry for discovery and loading, and a top-level module for aggregated session iteration and stats. ChangesProvider session discovery
Sequence Diagram(s)Discovery flow: sequenceDiagram
participant Client
participant discovery_py as discovery.py
participant ProviderRegistry
participant ClaudeProvider
participant AgyProvider
Client->>discovery_py: discover_all_sessions()
discovery_py->>ProviderRegistry: discover_providers()
discovery_py->>ProviderRegistry: discover_all_sessions()
ProviderRegistry->>ClaudeProvider: discover_sessions()
ProviderRegistry->>AgyProvider: discover_sessions()
Session load flow: sequenceDiagram
participant Client
participant discovery_py as discovery.py
participant ProviderRegistry
participant ClaudeProvider
participant AgyProvider
Client->>discovery_py: load_session(provider_name, session_id)
discovery_py->>ProviderRegistry: load_session(provider_name, session_id)
alt provider_name == claude
ProviderRegistry->>ClaudeProvider: load_session(session_id)
else provider_name == agy
ProviderRegistry->>AgyProvider: load_session(session_id)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
claude_code_log/providers/registry.py (1)
29-36: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winSilent broad-except hides provider init failures.
Swallowing every
Exceptionwith a barepassmakes a misconfigured or buggy provider silently disappear with no diagnostic trail, which is hard to debug in the field. Consider logging the failure (provider name + exception) at warning level.♻️ Proposed change
+import logging + +logger = logging.getLogger(__name__) + def instantiate_registered(self) -> None: - for provider_class in self._provider_classes.values(): + for name, provider_class in self._provider_classes.items(): try: provider = provider_class() self.register(provider) - except Exception: - # Skip providers that fail to initialize - pass + except Exception: + logger.warning("Provider %r failed to initialize", name, exc_info=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/providers/registry.py` around lines 29 - 36, The instantiate_registered method is swallowing all provider initialization failures with a bare except and pass, which hides misconfigured providers. Update this loop in Registry.instantiate_registered to catch the failure, log a warning that includes the provider_class name and the exception details, and then continue skipping that provider instead of silently ignoring it.claude_code_log/discovery.py (3)
21-29: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueDuplicates
ProviderRegistry.discover_all_sessions.When
providers is None, this loop re-implements logic already inregistry.discover_all_sessions()(Lines 54-58 of registry.py), including theis_available()gate. Consider delegating to the registry to avoid divergence, e.g. fall back toregistry.discover_all_sessions()for theNonecase and only filter by name when an explicit list is given.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/discovery.py` around lines 21 - 29, Replace the duplicated discovery logic in discover_sessions so the providers is None path delegates to ProviderRegistry.discover_all_sessions instead of re-implementing the availability check and iteration. Keep the explicit providers filter path using discover_providers() and registry.get_provider(provider_name), but for the default case call the registry method directly to stay aligned with ProviderRegistry.discover_all_sessions and avoid divergence.
45-55: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low valueRe-discovering providers on every facade call.
Each of
discover_all_sessions,discover_sessions_by_provider,get_session_stats, andload_sessioncallsdiscover_providers(), which re-instantiates all providers and re-runsis_available()filesystem checks. For callers invoking these in sequence this repeats work; consider accepting an optional pre-builtProviderRegistryor memoizing discovery.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/discovery.py` around lines 45 - 55, The session discovery helpers are repeatedly rebuilding provider state by calling discover_providers() inside each facade function, which re-runs availability checks and duplicates work; update discover_all_sessions, discover_sessions_by_provider, get_session_stats, and load_session to reuse a shared ProviderRegistry, either by accepting an optional registry parameter or by memoizing the discovery result. Use the existing discover_providers, ProviderRegistry, and related helper names to centralize provider lookup instead of instantiating providers on every call.
58-69: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
max_messagesnot exposed through the facade.
ProviderRegistry.load_session(and the underlying provider) acceptmax_messages, but this public entrypoint drops it, so callers cannot bound transcript loading. Consider threading it through.♻️ Proposed change
-def load_session(provider_name: str, session_id: str): +def load_session( + provider_name: str, session_id: str, max_messages: Optional[int] = None +): @@ registry = discover_providers() - return registry.load_session(provider_name, session_id) + return registry.load_session(provider_name, session_id, max_messages=max_messages)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/discovery.py` around lines 58 - 69, The public load_session facade in discovery.py drops the max_messages option even though ProviderRegistry.load_session and the provider implementations support it, so thread that argument through this entrypoint. Update load_session to accept max_messages and pass it into registry.load_session alongside provider_name and session_id, keeping the signature aligned with ProviderRegistry.load_session so callers can bound transcript loading.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@claude_code_log/providers/agy.py`:
- Around line 96-97: The max_messages check in the Agy transcript reader is
using the JSONL line index from the loop instead of the number of yielded
TranscriptEntry objects, so it does not actually limit emitted messages and is
off by one. Update the logic in the reading path around the loop that builds
TranscriptEntry values so it counts produced entries (not lines) and stops once
max_messages entries have been yielded, using the existing transcript parsing
flow to enforce the limit correctly.
- Around line 77-94: The session loader in the Agy transcript parsing flow stops
on a single bad JSON line because `json.loads` inside the generator is
unguarded. Update the `AGY` parser method that iterates the transcript file to
catch `json.JSONDecodeError`, log the malformed line with session context, and
continue to the next line. Keep yielding valid entries from `_parse_entry` and
preserve `prev_uuid` handling so one corrupt record does not abort the rest of
the session.
- Around line 326-340: The UUID generation in the tool call loop is not unique
when multiple tool calls share the same name within a single entry. Update the
`agy` provider logic in the loop that builds entries with `make_assistant_entry`
so the `uid` includes the per-tool loop position (or another unique per-call
value) in addition to `session_id`, `index`, and `name`, ensuring each
`parentUuid` chain remains unambiguous even when `name` repeats.
- Around line 61-72: `AgyProvider.load_session()` builds the transcript path
directly from caller-supplied `session_id`, which can allow `..` or path
separators to escape the intended `brain/<id>/.../transcript.jsonl` location.
Add validation on `session_id` before composing `transcript_file` so only safe
IDs are accepted, either by rejecting path separators/parent traversal or by
enforcing a strict allowed pattern. Keep the check near the existing path
construction in `load_session()` and fail fast with a clear error if the ID is
invalid.
---
Nitpick comments:
In `@claude_code_log/discovery.py`:
- Around line 21-29: Replace the duplicated discovery logic in discover_sessions
so the providers is None path delegates to
ProviderRegistry.discover_all_sessions instead of re-implementing the
availability check and iteration. Keep the explicit providers filter path using
discover_providers() and registry.get_provider(provider_name), but for the
default case call the registry method directly to stay aligned with
ProviderRegistry.discover_all_sessions and avoid divergence.
- Around line 45-55: The session discovery helpers are repeatedly rebuilding
provider state by calling discover_providers() inside each facade function,
which re-runs availability checks and duplicates work; update
discover_all_sessions, discover_sessions_by_provider, get_session_stats, and
load_session to reuse a shared ProviderRegistry, either by accepting an optional
registry parameter or by memoizing the discovery result. Use the existing
discover_providers, ProviderRegistry, and related helper names to centralize
provider lookup instead of instantiating providers on every call.
- Around line 58-69: The public load_session facade in discovery.py drops the
max_messages option even though ProviderRegistry.load_session and the provider
implementations support it, so thread that argument through this entrypoint.
Update load_session to accept max_messages and pass it into
registry.load_session alongside provider_name and session_id, keeping the
signature aligned with ProviderRegistry.load_session so callers can bound
transcript loading.
In `@claude_code_log/providers/registry.py`:
- Around line 29-36: The instantiate_registered method is swallowing all
provider initialization failures with a bare except and pass, which hides
misconfigured providers. Update this loop in Registry.instantiate_registered to
catch the failure, log a warning that includes the provider_class name and the
exception details, and then continue skipping that provider instead of silently
ignoring it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ce191374-cdf8-43c6-bc44-20d351a8eadd
📒 Files selected for processing (6)
claude_code_log/discovery.pyclaude_code_log/providers/__init__.pyclaude_code_log/providers/agy.pyclaude_code_log/providers/base.pyclaude_code_log/providers/claude.pyclaude_code_log/providers/registry.py
- Path traversal: validate session_id against hex+dash pattern - Malformed JSON: skip bad lines with logging instead of aborting - max_messages: count yielded entries, not raw JSONL lines - Duplicate UUIDs: include tool call index in UUID
|
Thanks @coderabbitai — all 4 items addressed:
|
|
Tip For best results, initiate chat on the files or code changes.
🐇✨ If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
claude_code_log/providers/agy.py (1)
43-55: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winFilter out session IDs that
load_session()will reject.Line 53 yields any directory name, but Lines 60-61 later reject anything outside
[a-f0-9-]+. That meansdiscover_sessions()can return sessions this provider cannot load ifbrain/contains any non-session subdirectory with a transcript file. Reuse_is_valid_session_id(session_dir.name)here so discovery and loading stay consistent.Suggested fix
for session_dir in brain_dir.iterdir(): if not session_dir.is_dir(): continue + if not self._is_valid_session_id(session_dir.name): + continue transcript_file = ( session_dir / ".system_generated" / "logs" / "transcript.jsonl" )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/providers/agy.py` around lines 43 - 55, discover_sessions() in the agy provider is returning directories that load_session() will later reject, so align discovery with loading. In the loop over session_dir entries, reuse _is_valid_session_id(session_dir.name) before yielding SessionInfo so only valid session IDs are discovered, keeping the provider’s session listing consistent with its load rules.
♻️ Duplicate comments (1)
claude_code_log/providers/agy.py (1)
99-107: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winEnforce
max_messagesinside the inner yield loop.This now counts emitted
TranscriptEntryobjects, but the limit is still checked only after_parse_entry()finishes. A singlePLANNER_RESPONSEwith multiple tool calls can therefore emit past the requested cap before Line 107 breaks.Suggested fix
entry = cast(dict[str, Any], raw_entry) for transcript_entry in self._parse_entry( entry, session_id, message_count, prev_uuid ): + if max_messages is not None and message_count >= max_messages: + return if hasattr(transcript_entry, "uuid"): prev_uuid = cast(Any, transcript_entry).uuid yield transcript_entry message_count += 1 - - if max_messages is not None and message_count >= max_messages: - break🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/providers/agy.py` around lines 99 - 107, The max_messages cap is only enforced after _parse_entry() finishes, so a single entry can yield too many TranscriptEntry objects before the loop stops. In the iterator that calls _parse_entry in agy.py, add the max_messages check inside the inner for-loop immediately after each yield in the transcript_entry emission path, and break out as soon as message_count reaches the limit while preserving prev_uuid updates in the same loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@claude_code_log/providers/agy.py`:
- Around line 43-55: discover_sessions() in the agy provider is returning
directories that load_session() will later reject, so align discovery with
loading. In the loop over session_dir entries, reuse
_is_valid_session_id(session_dir.name) before yielding SessionInfo so only valid
session IDs are discovered, keeping the provider’s session listing consistent
with its load rules.
---
Duplicate comments:
In `@claude_code_log/providers/agy.py`:
- Around line 99-107: The max_messages cap is only enforced after _parse_entry()
finishes, so a single entry can yield too many TranscriptEntry objects before
the loop stops. In the iterator that calls _parse_entry in agy.py, add the
max_messages check inside the inner for-loop immediately after each yield in the
transcript_entry emission path, and break out as soon as message_count reaches
the limit while preserving prev_uuid updates in the same loop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3ccea18d-7d93-45c5-82e5-deb28723c472
📒 Files selected for processing (1)
claude_code_log/providers/agy.py
|
Woops ;-) Sorry, I don't know why I just noticed #242 and not this follow-up! My bad. I'll review them together, hopefully later today. |
|
@cboos -- No worries, I appreciate your perspective. I've been tinkering with making this PR more meaningful by adding cli flags and/or output rendering, but I'm quickly hitting walls that require much bigger changes. For example, I found that AGY session data wasn't populating the search index (no session previews, no timestamps), and the search result display assumed Claude's URL format. I've identified fixes for these locally, but they touch the CLI data pipeline and the search template, which I respect is a lot to change in one PR. Happy to push an updated version of #243 with these fixes, or split the index/search rendering into a separate follow-up if you'd prefer to keep each PR focused. Finally, I also noticed a pre-existing cosmetic issue where the <!-- Search Component --> comment is duplicated. The index.html has its own comment, then the included search.html also starts with one. I was getting tripped up for a while trying to figure out if I'd introduced that bug before I finally realized it was preexisting. It doesn't break functionality on Claude indexes, but just to flag. |
|
(Claude) Reviewed #243 on top of #242, and re-tested on a real ~500-entry Antigravity session. Together these two PRs resolve every concrete finding from the earlier review of #225:
Re-test on the same session that previously lost 19 of 47 entries: 49 → 68 entries, ~49 flat roots → 1 threaded root, and the Where to take it next. This is a good foundation, but there's still a gap to get agy output really on par with Claude Code's own log rendering: structural fidelity. Right now tool actions render as labeled assistant text (
That's the step from "all the content is now present" to "reads like a Claude transcript." Nice work on the cleanup and the split. |
Depends on #242 (base abstraction). Adds the AGY (Antigravity) provider as the second fully-realized adapter.
What's included
AgyProvider— discovers and loads sessions from~/.gemini/antigravity-cli/brain/parentUuidchaining creates a proper DAG instead of flat entriesUSER_INPUT→ user message (extracts<USER_REQUEST>content)PLANNER_RESPONSE→ assistant message + tool callsCHECKPOINT→ compaction summariesLIST_DIRECTORY→ tool outputGENERIC→ uncategorized model outputRUN_COMMAND→ shell command executionsVIEW_FILE→ file readsCODE_ACTION→ code modifications (edits, writes)What's not included (coming as separate PRs)
--providerCLI flag wiring--all-providerspath filtering compositionVerification
No behavior change to existing functionality — AGY is purely additive. The provider follows the same
BaseProviderABC established in #242.Summary by CodeRabbit