Skip to content

Fix deadlock when bundle script hooks write large stderr output#5499

Open
simonfaltum wants to merge 2 commits into
mainfrom
simonfaltum/b3-script-hook-deadlock
Open

Fix deadlock when bundle script hooks write large stderr output#5499
simonfaltum wants to merge 2 commits into
mainfrom
simonfaltum/b3-script-hook-deadlock

Conversation

@simonfaltum

Copy link
Copy Markdown
Member

Why

Found during a full-repo review of the CLI. Bundle script hooks (experimental.scripts) can deadlock a deploy forever. Hook output was read sequentially: stdout to EOF first, then stderr. A hook that writes more than the OS pipe buffer (about 64KiB) to stderr while stdout is still open blocks on the stderr write and never closes stdout, so the script and the CLI wait on each other indefinitely.

Changes

Before, stderr was only read after stdout reached EOF (via io.MultiReader); now stderr is drained concurrently, so the hook can never block on a full stderr pipe. In bundle/scripts/scripts.go, a goroutine spools stderr to memory while stdout is streamed line by line, and the spooled stderr is logged after stdout EOF. This keeps the existing stdout-then-stderr output order (no interleaving), so observable output is unchanged for hooks that complete today. The line logging loop is extracted into a logOutput helper and still emits a final line without a trailing newline.

Test plan

  • New unit test TestExecuteLargeStderrOutputDoesNotDeadlock writes well over the pipe buffer to stderr before touching stdout; verified it deadlocks against the previous implementation (killed by the test context after 60s) and passes in ~0.05s with the fix
  • go test -race ./bundle/scripts passes, including the existing no-trailing-newline test
  • Acceptance tests under acceptance/bundle/scripts pass for both engines with no output changes
  • ./task fmt-q, ./task lint-q, ./task checks all clean

This pull request and its description were written by Isaac.

Script hook output was read sequentially (stdout to EOF, then stderr), so a hook writing more than the OS pipe buffer (~64KiB) to stderr while stdout was still open blocked forever. Spool stderr to memory in a goroutine while streaming stdout, then log it after stdout EOF, keeping the existing output order.

Co-authored-by: Isaac
@simonfaltum simonfaltum requested a review from denik June 9, 2026 20:07
@simonfaltum simonfaltum temporarily deployed to test-trigger-is June 9, 2026 20:08 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is June 9, 2026 20:08 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

Based on git history, these people are best suited to review:

  • @denik -- recent work in bundle/scripts/

Eligible reviewers: @andrewnester, @anton-107, @janniklasrose, @lennartkats-db, @pietern, @shreyas-goenka

Suggestions based on git history. See OWNERS for ownership rules.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Commit: 06046b5

Run: 27238608477

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 261 928 8:58
💚​ aws windows 7 15 263 926 11:18
💚​ aws-ucws linux 7 15 357 842 7:47
💚​ aws-ucws windows 7 15 359 840 12:28
💚​ azure linux 1 17 264 926 7:53
💚​ azure windows 1 17 266 924 11:19
💚​ azure-ucws linux 1 17 362 838 8:48
💚​ azure-ucws windows 1 17 364 836 12:06
💚​ gcp linux 1 17 260 929 7:15
💚​ gcp windows 1 17 262 927 10:56
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 💚​R 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 💚​R 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 💚​R 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 29 slowest tests (at least 2 minutes):
duration env testname
6:46 aws windows TestAccept
6:17 azure windows TestAccept
6:03 azure-ucws windows TestAccept
6:01 gcp windows TestAccept
6:00 aws-ucws windows TestAccept
4:28 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:20 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:01 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:00 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:35 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:34 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:19 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:12 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:05 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:03 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:00 azure linux TestAccept
2:59 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:57 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:56 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:56 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:54 gcp linux TestAccept
2:50 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:50 azure-ucws linux TestAccept
2:49 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:46 aws-ucws linux TestAccept
2:41 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:41 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:38 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:32 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants