Add HTTP client timeout to all AWS SDK clients#4677
Add HTTP client timeout to all AWS SDK clients#4677haouc wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: haouc The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @haouc. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
Sets a default HTTP request timeout for all AWS SDK v2 clients to prevent indefinite hangs when AWS endpoints (or IMDS) become unresponsive.
Changes:
- Configure a 10s
http.Clienttimeout inGenerateAWSConfigso all service clients inherit it (including assumed-role clients). - Configure the same timeout for the EC2 IMDS config path, extracting
buildEC2MetadataConfigfor testability. - Add unit tests to assert both config paths set the HTTP client timeout.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/aws/aws_config.go | Adds shared SDK HTTP client timeout configuration via config.WithHTTPClient. |
| pkg/aws/cloud.go | Refactors EC2 metadata config creation into buildEC2MetadataConfig and applies HTTP timeout. |
| pkg/aws/cloud_test.go | Adds a unit test validating IMDS config path sets the HTTP timeout. |
| pkg/aws/aws_config_test.go | Adds a unit test validating GenerateAWSConfig sets the HTTP timeout. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
|
|
||
| func Test_buildEC2MetadataConfig_SetsHTTPClientTimeout(t *testing.T) { |
There was a problem hiding this comment.
Test_buildEC2MetadataConfig_SetsHTTPClientTimeout may be flaky/slow in CI because config.LoadDefaultConfig will try to resolve a region when none is configured (often via IMDS). To keep this unit test hermetic, set AWS_REGION/AWS_DEFAULT_REGION in the test (e.g., t.Setenv) before calling buildEC2MetadataConfig.
| func Test_buildEC2MetadataConfig_SetsHTTPClientTimeout(t *testing.T) { | |
| func Test_buildEC2MetadataConfig_SetsHTTPClientTimeout(t *testing.T) { | |
| t.Setenv("AWS_REGION", "us-west-2") | |
| t.Setenv("AWS_DEFAULT_REGION", "us-west-2") |
| httpClient, ok := cfg.HTTPClient.(*http.Client) | ||
| assert.True(t, ok, "HTTPClient should be *http.Client") | ||
| assert.Equal(t, defaultAWSSDKClientTimeout, httpClient.Timeout) |
There was a problem hiding this comment.
This test uses assert.True on the type assertion, but then dereferences httpClient.Timeout unconditionally. If the assertion fails, httpClient will be nil and the test will panic; use require.True/require.NotNil (or guard the dereference) so failures are reported cleanly.
| defaultOpts := []func(*config.LoadOptions) error{ | ||
| config.WithRegion(gen.cfg.Region), | ||
| config.WithHTTPClient(&http.Client{ | ||
| Timeout: defaultAWSSDKClientTimeout, | ||
| }), |
There was a problem hiding this comment.
The same http.Client construction (Timeout: defaultAWSSDKClientTimeout) is now duplicated here and in buildEC2MetadataConfig (pkg/aws/cloud.go). Consider extracting a small helper (e.g., newAWSSDKHTTPClient()) so future timeout/transport tweaks don’t drift between the main SDK config and the IMDS config.
| const ( | ||
| userAgent = "elbv2.k8s.aws" | ||
| // defaultAWSSDKClientTimeout is the timeout for individual HTTP requests made by AWS SDK clients. | ||
| defaultAWSSDKClientTimeout = 10 * time.Second |
There was a problem hiding this comment.
10s is reasonable for individual requests, but since the LBC talks to 9+ AWS services with varying response profiles, I'd prefer if we can have a slightly larger value (like 15s/20s). Also I'm wondering how it handles throttling situation, or because it's per request timeout, won't affect the controller behavior when API is throttling
There was a problem hiding this comment.
Throttling will be errored out quickly and having a timeout shouldn't be impacting existing behaviors on throttling.
There was a problem hiding this comment.
We want to fail fast and 15 sec is a safe compromise and looks fine to me.
|
/ok-to-test |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Issue
Description
AWS SDK v2 defaults to no HTTP request timeout, which means API calls can hang indefinitely if AWS is unresponsive. Set a 10-second timeout on all SDK clients by configuring an http.Client in GenerateAWSConfig and in the EC2 IMDS config.
This covers all 9 service clients (EC2, ELBV2, ACM, WAFv2, WAFRegional, Shield, RGT, STS, GlobalAccelerator), the EC2 IMDS client, and any dynamically-created assumed-role ELBV2 clients.
Extracted buildEC2MetadataConfig from NewCloud for testability. Added unit tests to verify both config paths set the timeout.
The in-cluster test confirmed the controller with the 10s HTTP client timeout starts up cleanly, successfully makes AWS API calls, acquires leader election, and runs all three controllers (ingress, service, targetGroupBinding) with zero errors or restarts.
Checklist
README.md, or thedocsdirectory)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯