[Bug] k8s_helper.py ignores failure conditions and hangs for a full 180 second timeout

client/k8s_helper.py watches the SandboxClaim resource until it resolved the underlying sandbox name, but it ignores any failure conditions reported by the controller, causing it to hang for the full 180-second timeout when the claim had already failed (e.g., with a ReconcilerError due to missing warm pods when defaulting to cold pods).

When running a large load test, sandboxclaim controller raised ReconcilerError on several claims, which were not correctly captured by the client, causing multiple 180 second hangs.

Client should fail fast and retry:
resolve_sandbox_name should actively check the claim's Ready condition; if it is False (ReconcilerError or Failed) the method should raise an exception immediately, allowing the application to fail fast and trigger the new retry loop instead of staying stuck in "Created" status.

/bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] k8s_helper.py ignores failure conditions and hangs for a full 180 second timeout #574

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] k8s_helper.py ignores failure conditions and hangs for a full 180 second timeout #574

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions