Lazy-load chat histories, fast list summaries, and mtime disk cache (closes #84)#88
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a disk-backed fingerprinted summary cache, composer-scoped KV loaders, cached composer→workspace mapping, summary-only and per-composer tab endpoints, refactors tab assembly into a shared helper, updates API routing for ChangesLazy-load workspace tabs with summary cache and scoped loaders
Sequence Diagram(s) sequenceDiagram
participant WorkspaceHTML as Workspace HTML
participant APIWorkspaces as API /api/workspaces
participant WorkspaceTabs as services.workspace_tabs
participant WorkspaceDB as services.workspace_db
WorkspaceHTML->>APIWorkspaces: GET /api/workspaces?nocache=...
APIWorkspaces->>WorkspaceTabs: list_workspace_tab_summaries(workspace_id, ..., nocache)
WorkspaceTabs->>WorkspaceDB: open_global_db() / cached composer→workspace lookup
WorkspaceDB-->>WorkspaceTabs: composer rows / cached map
WorkspaceTabs-->>APIWorkspaces: summary tabs payload
WorkspaceHTML->>APIWorkspaces: GET /api/workspaces/{workspace_id}/tabs/{composer_id}
APIWorkspaces->>WorkspaceTabs: assemble_single_tab(workspace_id, composer_id, ...)
WorkspaceTabs->>WorkspaceDB: load_bubbles_for_composer(composer_id), load_message_request_context_for_composer, load_code_block_diffs_for_composer
WorkspaceDB-->>WorkspaceTabs: scoped KV rows
WorkspaceTabs-->>APIWorkspaces: full tab payload (bubbles, codeBlockDiffs)
APIWorkspaces-->>WorkspaceHTML: JSON response
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
templates/workspace.html (1)
124-175:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPossible stale-render race when switching tabs quickly.
selectTabis now async and awaits a network fetch before callingrenderChat. If a user clicks conversation A (slow fetch) then conversation B, B may render first and A's later-resolving fetch will overwritemain-contentwith A's content while the sidebar shows B as active. Consider tracking the latest requested id and bailing out of stale resolutions.🛠️ Suggested guard
async function selectTab(id) { const summary = allTabs.find(t => t.id === id); if (!summary) return; + selectTab._latest = id;Then after each
await, before mutatingselectedTab/main-content:if (selectTab._latest !== id) return; // a newer selection superseded this one🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@templates/workspace.html` around lines 124 - 175, selectTab can race when multiple async fetches complete out-of-order; set a per-call marker (e.g. assign selectTab._latest = id at the start of selectTab) and after any await or before mutating shared state (tabCache, selectedTab, main-content, calling renderChat) check if selectTab._latest !== id and if so return early to avoid overwriting a newer tab; apply these guards around the fetch/try-catch resolution and any other async points so only the most-recent selection updates the UI.
🧹 Nitpick comments (1)
tests/test_summary_cache.py (1)
37-66: ⚡ Quick winAdd a round-trip test with a realistic fingerprint.
Current tests only exercise scalar-only fingerprints, so they don't catch the tuple→list serialization mismatch in
fingerprint_workspace_storage(see theservices/summary_cache.pyfinding). A test that callsfingerprint_workspace_storage(soworkspace_filesis populated), thenset_cached_projects/get_cached_projectswith that fingerprint, would guard against the cache silently missing on every repeat load.💚 Suggested regression test
def test_cache_hit_with_workspace_files_fingerprint(self): with tempfile.TemporaryDirectory() as ws: entry_dir = os.path.join(ws, "entry1") os.makedirs(entry_dir) with open(os.path.join(entry_dir, "state.vscdb"), "wb") as f: f.write(b"x") entries = [{"name": "entry1"}] fp = fingerprint_workspace_storage(ws, entries, global_db_path=None, rules=[]) set_cached_projects(fp, [{"id": "a"}], []) # Recompute the fingerprint to mimic a fresh process/page load. fp2 = fingerprint_workspace_storage(ws, entries, global_db_path=None, rules=[]) self.assertIsNotNone(get_cached_projects(fp2))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_summary_cache.py` around lines 37 - 66, Add a regression test that round-trips a realistic fingerprint containing workspace_files to catch the tuple→list serialization mismatch: use fingerprint_workspace_storage to create fp (with a temp workspace and an entry that creates a state.vscdb file), call set_cached_projects(fp, projects, warnings), then recompute fp2 = fingerprint_workspace_storage(...) to mimic a fresh process and assert get_cached_projects(fp2) is not None; this ensures set_cached_projects/get_cached_projects handle the workspace_files shape consistently (see fingerprint_workspace_storage, set_cached_projects, get_cached_projects, and services/summary_cache.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/summary_cache.py`:
- Around line 64-88: The fingerprint uses Python tuples in workspace_files but
JSON serialization converts tuples to lists, so cached fingerprints (lists)
never equal freshly computed fingerprints (tuples); change the construction of
workspace_files in the fingerprint generator to emit JSON-stable lists instead
of tuples (i.e., append [f"{name}/{rel}", mtime] rather than (f"{name}/{rel}",
mtime)) so set_cached_* and get_cached_* compare identical types, and optionally
make _fingerprint_equal normalize sequence types (e.g., convert tuples to lists
or use deep structural equality) to be robust to future tuple/list differences.
In `@services/workspace_db.py`:
- Around line 294-299: COMPOSER_ROWS_WITH_HEADERS_SQL's NOT LIKE predicate only
excludes the compact empty-array form and should also exclude the spaced JSON
form; update the SQL string constant COMPOSER_ROWS_WITH_HEADERS_SQL so the
predicate excludes both "\"fullConversationHeadersOnly\":[]" and
"\"fullConversationHeadersOnly\": []" (e.g. add an additional AND value NOT LIKE
'%fullConversationHeadersOnly\": []%' or normalize/remove whitespace in the
filter) to ensure rows with empty headers are filtered out.
In `@tests/test_workspace_listing_performance.py`:
- Around line 88-109: The test's spy patch is targeting
services.workspace_db.open_global_db but list_workspace_projects calls the
module-local open_global_db imported into services.workspace_listing, so update
the patch to target services.workspace_listing.open_global_db (e.g. import
services.workspace_listing as _ws_listing_mod and use
patch.object(_ws_listing_mod, "open_global_db", _spying_open_global_db)) so the
spy (_spying_open_global_db) actually intercepts calls used by
list_workspace_projects; also ensure executed_queries is asserted non-empty
before filtering for "bubbleId:%" to avoid vacuous success (refer to symbols
_spying_open_global_db, executed_queries, list_workspace_projects,
open_global_db, and patch.object).
In `@tests/test_workspace_tabs_summary.py`:
- Around line 111-133: The SQL spy wrapper in _collect_queries is patching
services.workspace_db.open_global_db but workspace_tabs imports open_global_db
directly, so change the patch target to services.workspace_tabs.open_global_db
in _collect_queries to ensure the wrapper runs; additionally harden the tests to
assert that the captured executed list is non-empty (fail the test if no SQL was
recorded) and update test_no_global_bubble_scan to call the exercised function
with nocache=True to force DB access; use the unique names _collect_queries,
executed, and test_no_global_bubble_scan/test_scoped_bubble_query_only to locate
the spots to modify.
---
Outside diff comments:
In `@templates/workspace.html`:
- Around line 124-175: selectTab can race when multiple async fetches complete
out-of-order; set a per-call marker (e.g. assign selectTab._latest = id at the
start of selectTab) and after any await or before mutating shared state
(tabCache, selectedTab, main-content, calling renderChat) check if
selectTab._latest !== id and if so return early to avoid overwriting a newer
tab; apply these guards around the fetch/try-catch resolution and any other
async points so only the most-recent selection updates the UI.
---
Nitpick comments:
In `@tests/test_summary_cache.py`:
- Around line 37-66: Add a regression test that round-trips a realistic
fingerprint containing workspace_files to catch the tuple→list serialization
mismatch: use fingerprint_workspace_storage to create fp (with a temp workspace
and an entry that creates a state.vscdb file), call set_cached_projects(fp,
projects, warnings), then recompute fp2 = fingerprint_workspace_storage(...) to
mimic a fresh process and assert get_cached_projects(fp2) is not None; this
ensures set_cached_projects/get_cached_projects handle the workspace_files shape
consistently (see fingerprint_workspace_storage, set_cached_projects,
get_cached_projects, and services/summary_cache.py).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c8f448e2-81de-4319-86b7-18abdc0bb2c8
📒 Files selected for processing (13)
CHANGELOG.mdapi/workspaces.pyservices/summary_cache.pyservices/workspace_db.pyservices/workspace_listing.pyservices/workspace_tabs.pystatic/css/style.csstemplates/workspace.htmltests/test_parse_warnings.pytests/test_project_layouts_dict_shape.pytests/test_summary_cache.pytests/test_workspace_listing_performance.pytests/test_workspace_tabs_summary.py
Should fixservices/workspace_tabs.py:693 — In assemble_single_tab, replace load_project_layouts_map(global_db) with load_project_layouts_for_composer(global_db, composer_id) (and a minimal map {composer_id: …} for determine_project_for_conversation). (Per-tab load still full-scans all messageRequestContext:% rows, contradicting the scoped-load design and CHANGELOG wording for the single-tab endpoint.) services/workspace_tabs.py:696-701 — Gate the broad composerData:% alias query behind invalid_workspace_ids (same pattern as list_workspace_tab_summaries / listing), or scope it to the one composer. (Every tab open re-scans all composers for alias resolution even when no invalid workspaces exist.) api/workspaces.py:167-192 — Add Flask test-client coverage for GET /api/workspaces//tabs?summary=1 and GET /api/workspaces//tabs/<composer_id> (status, shape, no bubbles on summary). (New user-facing routes are only exercised via direct service calls; aligns with eval verification void.) Nice to haveservices/workspace_tabs.py:594 — messageCount uses len(headers) while _assemble_tab_from_composer_data may omit empty bubbles; consider counting renderable messages or documenting the difference. (Sidebar count can disagree with opened conversation.) templates/workspace.html:113 — Prefer data-id + addEventListener over onclick="selectTab('${tab.id}')". (Avoids breakage if IDs ever contain quotes; minor hardening.) services/workspace_listing.py:167-224, services/workspace_tabs.py:531-606 — Shared helper for composer-row → project assignment loop. (Reduces drift between list, summary, and full assembly under monolith-duplication.) |
|
@CodeRabbit, I t hink the last commit did address my finding fully, but can you also verify if those are addressed? i mean essential finding mostly. optional is not blocker |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== services/workspace_tabs.py: assemble_single_tab function ==="
grep -n "assemble_single_tab\|load_project_layouts\|load_project_layouts_for_composer\|load_project_layouts_map\|composerData:%\|invalid_workspace_ids\|alias" services/workspace_tabs.py | head -80
echo ""
echo "=== Full assemble_single_tab function ==="
# Find the function start and show ~80 lines from there
start=$(grep -n "^def assemble_single_tab" services/workspace_tabs.py | head -1 | cut -d: -f1)
if [ -n "$start" ]; then
sed -n "${start},$((start+120))p" services/workspace_tabs.py
fi
echo ""
echo "=== api/workspaces.py: new routes ==="
cat api/workspaces.py
echo ""
echo "=== tests/test_api_endpoints.py: new test coverage ==="
cat tests/test_api_endpoints.pyLength of output: 27950
Finding 1 —
|
|
@bradjin8 I think @CodeRabbit's analysis on Finding 1 is accurate. Need to address it fully, using scope |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
start=$(grep -n "^def assemble_single_tab" services/workspace_tabs.py | head -1 | cut -d: -f1)
if [ -n "$start" ]; then
sed -n "${start},$((start+160))p" services/workspace_tabs.py
fiLength of output: 6880
Current code (lines ~696–712): project_layouts_map: dict[str, list] = {}
invalid_workspace_aliases: dict[str, str] = {}
if invalid_workspace_ids:
project_layouts_map = load_project_layouts_map(global_db) # ← full messageRequestContext:% scan
composer_rows_for_aliases = _safe_fetchall(COMPOSER_ROWS_WITH_HEADERS_SQL)
invalid_workspace_aliases = infer_invalid_workspace_aliases(
composer_rows=composer_rows_for_aliases,
project_layouts_map=project_layouts_map,
project_name_map=project_name_map,
workspace_path_map=workspace_path_map,
workspace_entries=workspace_entries,
bubble_map={},
composer_id_to_ws=composer_id_to_ws,
invalid_workspace_ids=invalid_workspace_ids,
)
else:
project_layouts_map[composer_id] = load_project_layouts_for_composer(
global_db, composer_id,
)Proposed fix — collapse both branches, always use scoped loader: project_layouts_map: dict[str, list] = {}
invalid_workspace_aliases: dict[str, str] = {}
# Always scoped: only load messageRequestContext rows for this composer.
project_layouts_map[composer_id] = load_project_layouts_for_composer(
global_db, composer_id,
)
if invalid_workspace_ids:
# Alias resolution still needs the composer roster, but project layouts
# are intentionally limited to this composer (single-tab scope).
composer_rows_for_aliases = _safe_fetchall(COMPOSER_ROWS_WITH_HEADERS_SQL)
invalid_workspace_aliases = infer_invalid_workspace_aliases(
composer_rows=composer_rows_for_aliases,
project_layouts_map=project_layouts_map,
project_name_map=project_name_map,
workspace_path_map=workspace_path_map,
workspace_entries=workspace_entries,
bubble_map={},
composer_id_to_ws=composer_id_to_ws,
invalid_workspace_ids=invalid_workspace_ids,
)What this changes:
The |
Summary
Implements issue #84: split summary and full-assembly code paths so list views avoid scanning global
bubbleId:%rows, the workspace UI lazy-loads conversation bodies on tab selection, and repeat page loads are served from an mtime-keyed disk cache.GET /api/workspaces) — no global bubble scan; SQL-filteredcomposerData; per-composer MRC only when needed; cachedcomposer_id_to_wsand project listGET /api/workspaces/<id>/tabs?summary=1returns titles/metadata only; full payloads viaGET /api/workspaces/<id>/tabs/<composer_id>~/.cache/cursor-chat-browser/; invalidate on storage mtimes; bypass with?nocache=1orCURSOR_CHAT_BROWSER_NOCACHE=1/tabs, export, and search — unchanged; monolithic/tabsstill available for backward compatibilityProblem
On large local Cursor datasets, the home page and workspace page each rescanned global KV storage on every load (often 1–2+ minutes). The UI blocked on full tab assembly before the sidebar could render.
Solution
Backend
services/workspace_listing.pyload_bubble_map; SQL filter for non-empty headers; lazyload_project_layouts_for_composer; disk cache wrapperservices/workspace_tabs.pylist_workspace_tab_summaries,assemble_single_tab, shared_assemble_tab_from_composer_data; scoped bubble/MRC/diff loadersservices/workspace_db.pybuild_composer_id_to_workspace_id_cachedservices/summary_cache.pyapi/workspaces.py?summary=1,/tabs/<composer_id>,?nocache=1Frontend
templates/workspace.html— fetch summary tabs on load; lazy-fetch full tab on selection;tabCachefor loaded conversationsstatic/css/style.css— stable grid width during load;.main-content.is-loadingfor centered spinnerTests (356 pass)
tests/test_workspace_listing_performance.py— spy: no globalbubbleId:%on list pathtests/test_workspace_tabs_summary.py— summary/single-tab scoped queries and payload shapetests/test_summary_cache.py— cache hit/miss and fingerprint invalidationPerformance notes
GET /api/workspaces?nocache=1Document representative row counts and timings from your local fixture in review comments if helpful.
Test plan
pytest— all tests pass locallyGET /api/workspaces/<id>/tabs(nosummary=1) still work?nocache=1bypasses cache; cache files appear under~/.cache/cursor-chat-browser/Closes #84
Summary by CodeRabbit
New Features
Performance
Style
Documentation
Tests
Bug Fixes