Skip to content

test(spanner): make samples instance and backup cleanup robust against FailedPrecondition and quota exhaustion#17141

Merged
sakthivelmanii merged 1 commit into
mainfrom
fix-samples-conftest-leak
May 15, 2026
Merged

test(spanner): make samples instance and backup cleanup robust against FailedPrecondition and quota exhaustion#17141
sakthivelmanii merged 1 commit into
mainfrom
fix-samples-conftest-leak

Conversation

@sakthivelmanii
Copy link
Copy Markdown
Contributor

@sakthivelmanii sakthivelmanii commented May 14, 2026

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Root Cause

During Kokoro continuous test runs for Cloud Spanner samples, tests restore databases from backups. The restored databases hold active backend references to the backups (backup.referencing_databases) while optimizing in the background.
During module teardown, conftest.py calls database.drop() and immediately iterates to backup.delete(). Because database.drop() is asynchronous on the Spanner backend, backup.delete() frequently fails with a FailedPrecondition error (backup in use). This unhandled exception crashes the fixture teardown generator, skipping instance.delete() and leaking instances and backups.
As these leaked resources accumulate across frequent Kokoro runs in the shared test project, subsequent runs fail due to ResourceExhausted (quota exceeded) errors.

Solution

  • Created a robust retry helper (retry_cleanup) that retries on both ResourceExhausted and FailedPrecondition.
  • Isolated individual database drops and backup deletions within try...except exceptions.GoogleAPICallError blocks so that an error on a single resource never aborts the scrubbing generator.
  • Unified this robust scrubbing logic across cleanup_old_instances, sample_instance, and multi_region_instance teardown fixtures.

@sakthivelmanii sakthivelmanii requested a review from a team as a code owner May 14, 2026 19:39
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the robustness of the Spanner sample cleanup process by centralizing instance scrubbing logic into a shared helper function and implementing a more aggressive retry strategy for resource exhaustion and failed preconditions. The updates also ensure that databases are explicitly dropped during the cleanup of an instance. A review comment suggests further improving the error handling in the cleanup helper by catching a broader range of API errors and logging them as warnings to prevent silent failures while maintaining visibility for debugging.


retry_429(to_scrub.delete)()
retry_cleanup(to_scrub.delete)()
except exceptions.NotFound:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent the cleanup process from aborting due to a failure on a single instance, catch the broader exceptions.GoogleAPICallError instead of just NotFound. As this is a background task designed for retries, ensure the exception is logged as a warning to provide debugging information without creating excessive error noise.

Suggested change
except exceptions.NotFound:
except exceptions.GoogleAPICallError as e:
logger.warning(f"Error during cleanup: {e}")
References
  1. For exceptions in background tasks that are designed to be retried, log them as warnings rather than errors to reduce noise from transient, recoverable failures.
  2. Avoid broad except Exception: blocks that silently return None. Instead, log the exception to aid in debugging and prevent masking underlying issues.

@sakthivelmanii sakthivelmanii merged commit 2670eab into main May 15, 2026
31 checks passed
@sakthivelmanii sakthivelmanii deleted the fix-samples-conftest-leak branch May 15, 2026 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants