Skip to content

Add HTTP client timeout to all AWS SDK clients#4677

Open
haouc wants to merge 1 commit intokubernetes-sigs:mainfrom
haouc:main
Open

Add HTTP client timeout to all AWS SDK clients#4677
haouc wants to merge 1 commit intokubernetes-sigs:mainfrom
haouc:main

Conversation

@haouc
Copy link
Copy Markdown
Contributor

@haouc haouc commented Apr 7, 2026

Issue

Description

AWS SDK v2 defaults to no HTTP request timeout, which means API calls can hang indefinitely if AWS is unresponsive. Set a 10-second timeout on all SDK clients by configuring an http.Client in GenerateAWSConfig and in the EC2 IMDS config.

This covers all 9 service clients (EC2, ELBV2, ACM, WAFv2, WAFRegional, Shield, RGT, STS, GlobalAccelerator), the EC2 IMDS client, and any dynamically-created assumed-role ELBV2 clients.

Extracted buildEC2MetadataConfig from NewCloud for testability. Added unit tests to verify both config paths set the timeout.

The in-cluster test confirmed the controller with the 10s HTTP client timeout starts up cleanly, successfully makes AWS API calls, acquires leader election, and runs all three controllers (ingress, service, targetGroupBinding) with zero errors or restarts.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot requested review from oliviassss and shuqz April 7, 2026 21:54
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: haouc
Once this PR has been reviewed and has the lgtm label, please assign wweiwei-li for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 7, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @haouc. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 7, 2026
@oliviassss oliviassss requested a review from Copilot April 7, 2026 22:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Sets a default HTTP request timeout for all AWS SDK v2 clients to prevent indefinite hangs when AWS endpoints (or IMDS) become unresponsive.

Changes:

  • Configure a 10s http.Client timeout in GenerateAWSConfig so all service clients inherit it (including assumed-role clients).
  • Configure the same timeout for the EC2 IMDS config path, extracting buildEC2MetadataConfig for testability.
  • Add unit tests to assert both config paths set the HTTP client timeout.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pkg/aws/aws_config.go Adds shared SDK HTTP client timeout configuration via config.WithHTTPClient.
pkg/aws/cloud.go Refactors EC2 metadata config creation into buildEC2MetadataConfig and applies HTTP timeout.
pkg/aws/cloud_test.go Adds a unit test validating IMDS config path sets the HTTP timeout.
pkg/aws/aws_config_test.go Adds a unit test validating GenerateAWSConfig sets the HTTP timeout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/aws/cloud_test.go
}
}

func Test_buildEC2MetadataConfig_SetsHTTPClientTimeout(t *testing.T) {
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test_buildEC2MetadataConfig_SetsHTTPClientTimeout may be flaky/slow in CI because config.LoadDefaultConfig will try to resolve a region when none is configured (often via IMDS). To keep this unit test hermetic, set AWS_REGION/AWS_DEFAULT_REGION in the test (e.g., t.Setenv) before calling buildEC2MetadataConfig.

Suggested change
func Test_buildEC2MetadataConfig_SetsHTTPClientTimeout(t *testing.T) {
func Test_buildEC2MetadataConfig_SetsHTTPClientTimeout(t *testing.T) {
t.Setenv("AWS_REGION", "us-west-2")
t.Setenv("AWS_DEFAULT_REGION", "us-west-2")

Copilot uses AI. Check for mistakes.
Comment thread pkg/aws/cloud_test.go
Comment on lines +147 to +149
httpClient, ok := cfg.HTTPClient.(*http.Client)
assert.True(t, ok, "HTTPClient should be *http.Client")
assert.Equal(t, defaultAWSSDKClientTimeout, httpClient.Timeout)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses assert.True on the type assertion, but then dereferences httpClient.Timeout unconditionally. If the assertion fails, httpClient will be nil and the test will panic; use require.True/require.NotNil (or guard the dereference) so failures are reported cleanly.

Copilot uses AI. Check for mistakes.
Comment thread pkg/aws/aws_config.go Outdated
Comment on lines +48 to +52
defaultOpts := []func(*config.LoadOptions) error{
config.WithRegion(gen.cfg.Region),
config.WithHTTPClient(&http.Client{
Timeout: defaultAWSSDKClientTimeout,
}),
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same http.Client construction (Timeout: defaultAWSSDKClientTimeout) is now duplicated here and in buildEC2MetadataConfig (pkg/aws/cloud.go). Consider extracting a small helper (e.g., newAWSSDKHTTPClient()) so future timeout/transport tweaks don’t drift between the main SDK config and the IMDS config.

Copilot uses AI. Check for mistakes.
Comment thread pkg/aws/aws_config.go
const (
userAgent = "elbv2.k8s.aws"
// defaultAWSSDKClientTimeout is the timeout for individual HTTP requests made by AWS SDK clients.
defaultAWSSDKClientTimeout = 10 * time.Second
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10s is reasonable for individual requests, but since the LBC talks to 9+ AWS services with varying response profiles, I'd prefer if we can have a slightly larger value (like 15s/20s). Also I'm wondering how it handles throttling situation, or because it's per request timeout, won't affect the controller behavior when API is throttling

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throttling will be errored out quickly and having a timeout shouldn't be impacting existing behaviors on throttling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to fail fast and 15 sec is a safe compromise and looks fine to me.

@shraddhabang
Copy link
Copy Markdown
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 10, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants