fix: tolerate transient gateway 5xx lane-wide; de-mask live CI#134
Merged
Conversation
Live keyed lane went red again on a transient prod 504 — this time test_cex_candle (ServerError 504 "upstream request timeout"). #133's per-endpoint live_call wrapping only covered TestPremium, so the same infra flakiness resurfaced on the next unwrapped call. The run still showed green: job-level continue-on-error masked the failed job. - conftest: replace per-call live_call helper with autouse fixture _tolerate_transient_gateway — monkeypatches API.send_request (the one method every endpoint inherits) to retry transient 502/503/504 then pytest.skip. Covers test_call.py + test_integration.py + future live tests. No-op without a key (keyless/mocked lanes untouched). - test_integration: revert the 19 live_call(lambda: ...) premium wraps to direct calls; drop now-dead import. - live-tests.yml: move continue-on-error from job to pytest step so the job conclusion stays honest (setup failures still red) while the push stays non-blocking; add a step that annotates + writes a run-summary warning when the live suite fails, so failures are visible not silent. keyless offline suite unchanged: 134 passed, 11 skipped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Live keyed tests lane failed again on a transient prod 504 — this run it hit
test_cex_candle(ServerError: (504, 'upstream request timeout')). #133 added transient-5xx tolerance but wired it only intoTestPremium, so the same prod-wide gateway flakiness resurfaced on the next unwrapped call. Worse, the run reported green: job-levelcontinue-on-errormasked the failed job, so it slipped by silently.This moves tolerance to the HTTP boundary (can't be whack-a-mole'd again) and makes lane failures visible without gating the push.
Changes
tests/conftest.py— replace the per-calllive_call()helper with an autouse fixture_tolerate_transient_gatewaythat monkeypatchesAPI.send_request(the single method every endpoint object inherits) to retry transient 502/503/504 with linear backoff, thenpytest.skip. Coverstest_call.py, all oftest_integration.py, and any future live test. Gated onAPI_KEY→ no-op for keyless/mocked lanes.tests/test_integration.py— revert the 19live_call(lambda: …)premium wraps from test: tolerate transient gateway 504 on live premium lane #133 to direct calls; drop the now-dead import. No double-wrapping..github/workflows/live-tests.yml— movecontinue-on-errorfrom the job to the pytest step so the job conclusion stays honest (setup/dependency failures still go red) while the push stays non-blocking; add a step that emits a::warningannotation + run-summary line when the live suite fails, so a red live lane is visible instead of a green run hiding it.Tests
pytest -m "not integration").black --check+flake8clean on touched files; workflow YAML parses.Known limitations
live_call. It'll be validated by the live lane on merge tomain.