fix(server): skip /load_expr pipeline for warm sessions; share xorq backend (#896, #899)#901
fix(server): skip /load_expr pipeline for warm sessions; share xorq backend (#896, #899)#901paddymul wants to merge 5 commits into
Conversation
…ckend (#896, #899) #899: LoadExprHandler.post now short-circuits for repeat POSTs that supply the same session_id + build_dir — returns cached metadata without re-running the ~3–4 s stat pipeline. #896: load_expr_build_dir calls xorq.config.default_backend() instead of connect() directly, pre-warming xorq's process-wide singleton so internal paths (deferred_reads_to_memtables) reuse the same SessionContext. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c4bb7866e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # a prior response (UUID-generated ids are always new). | ||
| sessions = self.application.settings["sessions"] | ||
| existing = sessions.get(session_id) | ||
| if existing and getattr(existing, "build_dir", None) == build_dir and existing.metadata: |
There was a problem hiding this comment.
Require cached /load_expr sessions to still be xorq
If a user loads an expression, then reuses the same session with /load for a pandas/polars file, and later posts /load_expr again with the original build_dir, this cache branch fires even though LoadHandler.post only clears xorq_dataflow/expr and leaves build_dir behind. The handler then returns the file session's cached metadata without rebuilding expr/xorq_dataflow or switching backend back to xorq, so the UI keeps serving the stale file data. Please gate the fast path on the existing session still being a valid xorq session, not just matching build_dir.
Useful? React with 👍 / 👎.
📦 TestPyPI package publishedpip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.16.dev27245887969or with uv: uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.16.dev27245887969MCP server for Claude Codeclaude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.16.dev27245887969" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table📖 Docs preview🎨 Storybook preview |
On a cache hit, read the result parquet directly rather than routing the query back through cache().execute(), which re-plans and re-executes the whole expression through DataFusion just to reach the cached node (~30ms for a single-table expr, ~125ms for a join, vs ~1-3ms for the read). Across the per-column histogram queries this cut a warm citibike load from ~0.65s to ~0.08s and a diff compare-load from ~3.0s to ~0.5s. On a miss the path is unchanged (compute + write); a corrupt or partial cache file falls back to recompute. Also skip _maybe_materialize when a cache_storage is set: the __buckaroo_histo_mat_<uuid> table name it injects changes the expression token on every load, so the snapshot cache never hit. Skipping it keeps the per-column queries content-addressed and stable across restarts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pass force_reload=true to bypass the warm-session short-circuit and re-run the full stat pipeline against the same build_dir. Useful for benchmarking the cache path and for hosts that need to recompute after editing project stats without minting a new session id. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nfig The #899 warm-session early-exit returned cached metadata whenever the session_id + build_dir matched, silently ignoring any config-bearing fields (component_config, column_config_overrides, extra_grid_config, init_sd, skip_stat_columns) supplied on the repeat POST. Gate the early-exit on `not has_config` so a config change re-runs the pipeline; force_reload still bypasses unconditionally. Tests: - test_force_reload_bypasses_warm_session, test_warm_session_with_new_ config_reruns (server): plain warm repeat skips the pipeline, but force_reload or new config re-runs it. - TestMaterialization::test_skipped_when_cache_storage_set: a filter chain materializes without cache_storage and is returned untouched with it (no __buckaroo_histo_mat_ table poisoning the cache key). - TestCacheStorageExecute::test_cache_hit_reads_parquet_directly: spies pd.read_parquet — 0 reads on a cold miss, 1 read on the warm hit. - Drop redundant in-method shutil/tempfile imports in test_shared_backend_singleton. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
LoadExprHandler.postnow short-circuits for repeat POSTs that supply the samesession_id+build_dir. Returns cached metadata without re-running the ~3–4 s stat pipeline. Only fires when the caller explicitly passes back a priorsession_id; UUID-generated (new) sessions are unaffected.load_expr_build_dirnow callsxorq.config.default_backend()instead ofconnect()directly. Pre-warms xorq's process-wide singleton (xorq.config.options.default_backend) so internal paths likedeferred_reads_to_memtablesreuse the sameSessionContexton every call.Test plan
TestLoadExprPerfFixes::test_warm_session_skips_pipeline— patchesload_expr_build_dirto raise on a second call; asserts 200 + correct metadata returned from cacheTestLoadExprPerfFixes::test_new_session_always_runs_pipeline— two distinct session ids sharing a build_dir each trigger a full pipeline runTestLoadExprPerfFixes::test_shared_backend_singleton— patchesxorq.config.default_backendwith a tracker; asserts both calls return the same backend idtests/unit/server/suite: 137 passed, 1 skipped, 0 failures🤖 Generated with Claude Code