Skip to content

fix(server): skip /load_expr pipeline for warm sessions; share xorq backend (#896, #899)#901

Open
paddymul wants to merge 5 commits into
mainfrom
fix/896-899-load-expr-perf
Open

fix(server): skip /load_expr pipeline for warm sessions; share xorq backend (#896, #899)#901
paddymul wants to merge 5 commits into
mainfrom
fix/896-899-load-expr-perf

Conversation

@paddymul

@paddymul paddymul commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Test plan

  • TestLoadExprPerfFixes::test_warm_session_skips_pipeline — patches load_expr_build_dir to raise on a second call; asserts 200 + correct metadata returned from cache
  • TestLoadExprPerfFixes::test_new_session_always_runs_pipeline — two distinct session ids sharing a build_dir each trigger a full pipeline run
  • TestLoadExprPerfFixes::test_shared_backend_singleton — patches xorq.config.default_backend with a tracker; asserts both calls return the same backend id
  • Full tests/unit/server/ suite: 137 passed, 1 skipped, 0 failures

🤖 Generated with Claude Code

paddymul and others added 2 commits June 9, 2026 12:08
…ckend (#896, #899)

#899: LoadExprHandler.post now short-circuits for repeat POSTs that supply
the same session_id + build_dir — returns cached metadata without re-running
the ~3–4 s stat pipeline.

#896: load_expr_build_dir calls xorq.config.default_backend() instead of
connect() directly, pre-warming xorq's process-wide singleton so internal
paths (deferred_reads_to_memtables) reuse the same SessionContext.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c4bb7866e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread buckaroo/server/handlers.py Outdated
# a prior response (UUID-generated ids are always new).
sessions = self.application.settings["sessions"]
existing = sessions.get(session_id)
if existing and getattr(existing, "build_dir", None) == build_dir and existing.metadata:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require cached /load_expr sessions to still be xorq

If a user loads an expression, then reuses the same session with /load for a pandas/polars file, and later posts /load_expr again with the original build_dir, this cache branch fires even though LoadHandler.post only clears xorq_dataflow/expr and leaves build_dir behind. The handler then returns the file session's cached metadata without rebuilding expr/xorq_dataflow or switching backend back to xorq, so the UI keeps serving the stale file data. Please gate the fast path on the existing session still being a valid xorq session, not just matching build_dir.

Useful? React with 👍 / 👎.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.16.dev27245887969

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.16.dev27245887969

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.16.dev27245887969" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

paddymul and others added 2 commits June 9, 2026 16:17
On a cache hit, read the result parquet directly rather than routing the
query back through cache().execute(), which re-plans and re-executes the
whole expression through DataFusion just to reach the cached node (~30ms for
a single-table expr, ~125ms for a join, vs ~1-3ms for the read). Across the
per-column histogram queries this cut a warm citibike load from ~0.65s to
~0.08s and a diff compare-load from ~3.0s to ~0.5s. On a miss the path is
unchanged (compute + write); a corrupt or partial cache file falls back to
recompute.

Also skip _maybe_materialize when a cache_storage is set: the
__buckaroo_histo_mat_<uuid> table name it injects changes the expression
token on every load, so the snapshot cache never hit. Skipping it keeps the
per-column queries content-addressed and stable across restarts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pass force_reload=true to bypass the warm-session short-circuit and re-run
the full stat pipeline against the same build_dir. Useful for benchmarking
the cache path and for hosts that need to recompute after editing project
stats without minting a new session id.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nfig

The #899 warm-session early-exit returned cached metadata whenever the
session_id + build_dir matched, silently ignoring any config-bearing
fields (component_config, column_config_overrides, extra_grid_config,
init_sd, skip_stat_columns) supplied on the repeat POST. Gate the
early-exit on `not has_config` so a config change re-runs the pipeline;
force_reload still bypasses unconditionally.

Tests:
- test_force_reload_bypasses_warm_session, test_warm_session_with_new_
  config_reruns (server): plain warm repeat skips the pipeline, but
  force_reload or new config re-runs it.
- TestMaterialization::test_skipped_when_cache_storage_set: a filter
  chain materializes without cache_storage and is returned untouched
  with it (no __buckaroo_histo_mat_ table poisoning the cache key).
- TestCacheStorageExecute::test_cache_hit_reads_parquet_directly: spies
  pd.read_parquet — 0 reads on a cold miss, 1 read on the warm hit.
- Drop redundant in-method shutil/tempfile imports in
  test_shared_backend_singleton.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@paddymul paddymul deployed to testpypi June 10, 2026 00:59 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant