client/k8s_helper.py watches the SandboxClaim resource until it resolved the underlying sandbox name, but it ignores any failure conditions reported by the controller, causing it to hang for the full 180-second timeout when the claim had already failed (e.g., with a ReconcilerError due to missing warm pods when defaulting to cold pods).
When running a large load test, sandboxclaim controller raised ReconcilerError on several claims, which were not correctly captured by the client, causing multiple 180 second hangs.
Client should fail fast and retry:
resolve_sandbox_name should actively check the claim's Ready condition; if it is False (ReconcilerError or Failed) the method should raise an exception immediately, allowing the application to fail fast and trigger the new retry loop instead of staying stuck in "Created" status.
/bug
client/k8s_helper.py watches the SandboxClaim resource until it resolved the underlying sandbox name, but it ignores any failure conditions reported by the controller, causing it to hang for the full 180-second timeout when the claim had already failed (e.g., with a ReconcilerError due to missing warm pods when defaulting to cold pods).
When running a large load test, sandboxclaim controller raised ReconcilerError on several claims, which were not correctly captured by the client, causing multiple 180 second hangs.
Client should fail fast and retry:
resolve_sandbox_name should actively check the claim's Ready condition; if it is False (ReconcilerError or Failed) the method should raise an exception immediately, allowing the application to fail fast and trigger the new retry loop instead of staying stuck in "Created" status.
/bug