[4244] fix(sdk): scope AWS credentials per request to stop Bedrock leak#4797
Conversation
Per-request AWS credentials were installed by mutating process-global os.environ around an awaited mockllm.acompletion(...) call. Because the event loop can switch coroutines while those variables are set, overlapping requests in the same worker could observe each other's credentials. The resolved credentials are already forwarded to LiteLLM as request-scoped aws_* kwargs via **provider_settings, so the env-mutation context manager is redundant and is the source of the leak. Normalize the AWS credential and region aliases (env-style uppercase, aws_region vs aws_region_name) into LiteLLM's canonical aws_* params so credentials stay request-scoped without touching os.environ, and remove user_aws_credentials_from, ENV_KEYS_TO_CLEAR, ENV_KEYS_FROM_USER, and _coerce_credentials. Closes Agenta-AI#4244
|
@Koushik-Salammagari is attempting to deploy a commit to the agenta projects Team on Vercel. A member of the Team first needs to authorize it. |
|
✅ Thanks @Koushik-Salammagari! This PR now meets the contribution requirements and has been reopened. A maintainer will review it soon. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
Disabled knowledge base sources:
📝 WalkthroughSummary by CodeRabbit
WalkthroughRemoves the ChangesAWS Credential Env-Mutation Replacement
Sequence Diagram(s)sequenceDiagram
participant Caller
participant handlers as handlers.py
participant normalize as _normalize_aws_provider_settings
participant acompletion as mockllm.acompletion
Caller->>handlers: auto_ai_critique_v0(provider_settings)
handlers->>normalize: _normalize_aws_provider_settings(provider_settings)
normalize-->>handlers: {aws_access_key_id, aws_secret_access_key, aws_region_name, ...}
handlers->>acompletion: await acompletion(**canonical_aws_kwargs)
acompletion-->>handlers: LLM response
handlers-->>Caller: critique result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Concurrent Bedrock calls could leak AWS credentials between requests. Per-request credentials were installed by mutating process-global
os.environ(user_aws_credentials_from) around an awaitedmockllm.acompletion(...)call. Because the event loop can switch coroutines while those variables are set, overlapping requests in the same worker could observe each other's credentials — causing auth failures, billing against the wrong account, or signing requests against the wrong AWS account on ECS/Lambda.Root cause: the env mutation was redundant. The resolved AWS credentials are already forwarded to LiteLLM as request-scoped
aws_*kwargs via**provider_settingsat both call sites. LiteLLM resolves Bedrock/Sagemaker auth from those explicit kwargs, so the only thing theos.environmutation added was the cross-request leak.This PR keeps credentials request-scoped without touching
os.environ:_normalize_aws_provider_settings(...), which folds AWS credential/region aliases (env-style uppercaseAWS_ACCESS_KEY_ID, andaws_region/aws_default_regionvsaws_region_name) into LiteLLM's canonicalaws_*params, and drops the non-canonical aliases so they aren't forwarded as unrecognized kwargs. This preserves the region/casing handling the old env path provided.user_aws_credentials_fromcontext manager and itsENV_KEYS_TO_CLEAR/ENV_KEYS_FROM_USERlists frommockllm.py, and the now-unused_coerce_credentialshelper.Non-AWS providers are unaffected: settings without
aws_*keys pass through untouched.Testing
Verified locally
uv run pytest oss/tests/pytest/unit/— 613 passed.uv run pytest oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py oss/tests/pytest/unit/test_chat_v0_inputs.py oss/tests/pytest/unit/test_prompt_template_extensions.py— the directly affected modules pass.ruff formatandruff checkclean on all changed files.Added or updated tests
test_aws_credential_aliases_normalized_to_litellm_params— env-style uppercase keys andaws_regionare forwarded under LiteLLM's canonicalaws_*params, and non-canonical aliases are not leaked as kwargs.test_aws_credentials_do_not_mutate_environ— resolving AWS credentials leavesos.environuntouched.user_aws_credentials_fromto drop the now-removed boundary.QA follow-up
I could not exercise live AWS paths locally. Please QA with user-provided credentials on an ECS/Lambda deployment with an attached IAM role to confirm no fallback to the instance/container role:
invoke), Bedrock Converse, Bedrock streaming, and Sagemaker.Demo
This is a backend/SDK change with no UI surface, so the demo is the regression suite that pins the fix. The two tests below encode the acceptance criteria from #4244 — credentials stay request-scoped (
aws_*kwargs, no env mutation):test_aws_credentials_do_not_mutate_environis the direct regression for the leak: it snapshotsos.environbefore the awaited call and asserts it is unchanged afterwards. Onmain(env-mutation path) this assertion fails; with this PR it passes.test_aws_credential_aliases_normalized_to_litellm_paramsproves the credentials still reach LiteLLM — now as request-scopedaws_*kwargs rather than via the environment.Checklist
Closes #4244