Skip to content

[4244] fix(sdk): scope AWS credentials per request to stop Bedrock leak#4797

Open
Koushik-Salammagari wants to merge 1 commit into
Agenta-AI:mainfrom
Koushik-Salammagari:fix/4244-bedrock-credential-leakage
Open

[4244] fix(sdk): scope AWS credentials per request to stop Bedrock leak#4797
Koushik-Salammagari wants to merge 1 commit into
Agenta-AI:mainfrom
Koushik-Salammagari:fix/4244-bedrock-credential-leakage

Conversation

@Koushik-Salammagari

@Koushik-Salammagari Koushik-Salammagari commented Jun 22, 2026

Copy link
Copy Markdown

Summary

Concurrent Bedrock calls could leak AWS credentials between requests. Per-request credentials were installed by mutating process-global os.environ (user_aws_credentials_from) around an awaited mockllm.acompletion(...) call. Because the event loop can switch coroutines while those variables are set, overlapping requests in the same worker could observe each other's credentials — causing auth failures, billing against the wrong account, or signing requests against the wrong AWS account on ECS/Lambda.

Root cause: the env mutation was redundant. The resolved AWS credentials are already forwarded to LiteLLM as request-scoped aws_* kwargs via **provider_settings at both call sites. LiteLLM resolves Bedrock/Sagemaker auth from those explicit kwargs, so the only thing the os.environ mutation added was the cross-request leak.

This PR keeps credentials request-scoped without touching os.environ:

  • Adds _normalize_aws_provider_settings(...), which folds AWS credential/region aliases (env-style uppercase AWS_ACCESS_KEY_ID, and aws_region / aws_default_region vs aws_region_name) into LiteLLM's canonical aws_* params, and drops the non-canonical aliases so they aren't forwarded as unrecognized kwargs. This preserves the region/casing handling the old env path provided.
  • Removes the user_aws_credentials_from context manager and its ENV_KEYS_TO_CLEAR / ENV_KEYS_FROM_USER lists from mockllm.py, and the now-unused _coerce_credentials helper.
  • When a user supplies their own keys, LiteLLM builds a session from those static credentials and does not fall back to the container/instance role — so the previous "clear role creds" behavior is retained implicitly, per request.

Non-AWS providers are unaffected: settings without aws_* keys pass through untouched.

Testing

Verified locally

  • uv run pytest oss/tests/pytest/unit/613 passed.
  • uv run pytest oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py oss/tests/pytest/unit/test_chat_v0_inputs.py oss/tests/pytest/unit/test_prompt_template_extensions.py — the directly affected modules pass.
  • ruff format and ruff check clean on all changed files.

Added or updated tests

  • test_aws_credential_aliases_normalized_to_litellm_params — env-style uppercase keys and aws_region are forwarded under LiteLLM's canonical aws_* params, and non-canonical aliases are not leaked as kwargs.
  • test_aws_credentials_do_not_mutate_environ — resolving AWS credentials leaves os.environ untouched.
  • Updated the three unit tests that mocked user_aws_credentials_from to drop the now-removed boundary.

QA follow-up

I could not exercise live AWS paths locally. Please QA with user-provided credentials on an ECS/Lambda deployment with an attached IAM role to confirm no fallback to the instance/container role:

  • Bedrock (legacy invoke), Bedrock Converse, Bedrock streaming, and Sagemaker.
  • A concurrency check: two simultaneous requests with different credentials in one worker must not cross-contaminate.

Demo

This is a backend/SDK change with no UI surface, so the demo is the regression suite that pins the fix. The two tests below encode the acceptance criteria from #4244 — credentials stay request-scoped (aws_* kwargs, no env mutation):

Regression tests for the Bedrock credential leak passing

$ uv run pytest oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py -v \
    -k "aws_credential_aliases or aws_credentials_do_not_mutate"

============================= test session starts ==============================
platform darwin -- Python 3.12.11, pytest-9.1.0, pluggy-1.6.0
collecting ... collected 42 items / 40 deselected / 2 selected

test_auto_ai_critique_v0_runtime.py::test_aws_credential_aliases_normalized_to_litellm_params PASSED [ 50%]
test_auto_ai_critique_v0_runtime.py::test_aws_credentials_do_not_mutate_environ            PASSED [100%]

======================= 2 passed, 40 deselected in 1.53s =======================
  • test_aws_credentials_do_not_mutate_environ is the direct regression for the leak: it snapshots os.environ before the awaited call and asserts it is unchanged afterwards. On main (env-mutation path) this assertion fails; with this PR it passes.
  • test_aws_credential_aliases_normalized_to_litellm_params proves the credentials still reach LiteLLM — now as request-scoped aws_* kwargs rather than via the environment.

Checklist

  • I have included a demo (regression test run above); the change has no UI surface
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA

Closes #4244

Per-request AWS credentials were installed by mutating process-global
os.environ around an awaited mockllm.acompletion(...) call. Because the
event loop can switch coroutines while those variables are set, overlapping
requests in the same worker could observe each other's credentials.

The resolved credentials are already forwarded to LiteLLM as request-scoped
aws_* kwargs via **provider_settings, so the env-mutation context manager is
redundant and is the source of the leak. Normalize the AWS credential and
region aliases (env-style uppercase, aws_region vs aws_region_name) into
LiteLLM's canonical aws_* params so credentials stay request-scoped without
touching os.environ, and remove user_aws_credentials_from, ENV_KEYS_TO_CLEAR,
ENV_KEYS_FROM_USER, and _coerce_credentials.

Closes Agenta-AI#4244
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 22, 2026
@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

@Koushik-Salammagari is attempting to deploy a commit to the agenta projects Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot dosubot Bot added bug Something isn't working python Pull requests that update Python code SDK tests labels Jun 22, 2026
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

✅ Thanks @Koushik-Salammagari! This PR now meets the contribution requirements and has been reopened. A maintainer will review it soon.

@github-actions github-actions Bot added the incomplete-pr PR is missing required template sections or a demo recording label Jun 22, 2026
@github-actions github-actions Bot closed this Jun 22, 2026
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 31b38731-9717-4d1c-bd6d-a49d1679fd51

📥 Commits

Reviewing files that changed from the base of the PR and between 8b7e319 and 16de67d.

📒 Files selected for processing (5)
  • sdks/python/agenta/sdk/engines/running/handlers.py
  • sdks/python/agenta/sdk/litellm/mockllm.py
  • sdks/python/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py
  • sdks/python/oss/tests/pytest/unit/test_chat_v0_inputs.py
  • sdks/python/oss/tests/pytest/unit/test_prompt_template_extensions.py

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.


📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Improved AWS credential handling in LLM operations to properly normalize credential aliases into canonical parameter names, ensuring consistent behavior across different credential specification formats.
    • Fixed issue where environment variables were being mutated during AWS credential resolution.
    • Enhanced stability and reliability of auto_ai_critique and LLM retry flows with more robust credential processing.

Walkthrough

Removes the user_aws_credentials_from context manager that injected AWS credentials into os.environ around awaited LLM calls, replacing it with a new _normalize_aws_provider_settings helper that resolves credential and region aliases into LiteLLM-canonical request-scoped aws_* kwargs. Both auto_ai_critique_v0 and _run_prompt_llm_config_with_retry are updated to use the new approach, and tests are updated accordingly.

Changes

AWS Credential Env-Mutation Replacement

Layer / File(s) Summary
_normalize_aws_provider_settings helper and removal of env-mutation approach
sdks/python/agenta/sdk/engines/running/handlers.py, sdks/python/agenta/sdk/litellm/mockllm.py
Introduces _AWS_PARAM_ALIASES and _normalize_aws_provider_settings to resolve multiple credential/region alias keys (e.g., AWS_ACCESS_KEY_ID, aws_region) into LiteLLM canonical aws_* kwargs, dropping non-canonical keys. Removes _coerce_credentials, user_aws_credentials_from, ENV_KEYS_TO_CLEAR, ENV_KEYS_FROM_USER, and the contextmanager import from mockllm.py.
Call site updates in auto_ai_critique_v0 and _run_prompt_llm_config_with_retry
sdks/python/agenta/sdk/engines/running/handlers.py
Both LLM call sites now pass **_normalize_aws_provider_settings(provider_settings) directly to mockllm.acompletion instead of wrapping calls in the removed context manager.
Test fixture cleanup and new credential normalization tests
sdks/python/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.py, sdks/python/oss/tests/pytest/unit/test_chat_v0_inputs.py, sdks/python/oss/tests/pytest/unit/test_prompt_template_extensions.py
Removes all monkeypatches of user_aws_credentials_from and the _noop_aws_credentials helper from three test modules. Adds test_aws_credential_aliases_normalized_to_litellm_params asserting alias keys become canonical LiteLLM kwargs, and test_aws_credentials_do_not_mutate_environ asserting os.environ is unchanged before and after the awaited call.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant handlers as handlers.py
  participant normalize as _normalize_aws_provider_settings
  participant acompletion as mockllm.acompletion

  Caller->>handlers: auto_ai_critique_v0(provider_settings)
  handlers->>normalize: _normalize_aws_provider_settings(provider_settings)
  normalize-->>handlers: {aws_access_key_id, aws_secret_access_key, aws_region_name, ...}
  handlers->>acompletion: await acompletion(**canonical_aws_kwargs)
  acompletion-->>handlers: LLM response
  handlers-->>Caller: critique result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot removed the incomplete-pr PR is missing required template sections or a demo recording label Jun 22, 2026
@github-actions github-actions Bot reopened this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python Pull requests that update Python code SDK size:L This PR changes 100-499 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix credentials leakage between concurrent Bedrock calls

1 participant