feat(sdk): add optional AWS role assumption (Role ARN) for Bedrock/Sagemaker#4800
feat(sdk): add optional AWS role assumption (Role ARN) for Bedrock/Sagemaker#4800Koushik-Salammagari wants to merge 3 commits into
Conversation
Per-request AWS credentials were installed by mutating process-global os.environ around an awaited mockllm.acompletion(...) call. Because the event loop can switch coroutines while those variables are set, overlapping requests in the same worker could observe each other's credentials. The resolved credentials are already forwarded to LiteLLM as request-scoped aws_* kwargs via **provider_settings, so the env-mutation context manager is redundant and is the source of the leak. Normalize the AWS credential and region aliases (env-style uppercase, aws_region vs aws_region_name) into LiteLLM's canonical aws_* params so credentials stay request-scoped without touching os.environ, and remove user_aws_credentials_from, ENV_KEYS_TO_CLEAR, ENV_KEYS_FROM_USER, and _coerce_credentials. Closes Agenta-AI#4244
Adds optional IAM role assumption for AWS providers. When a Bedrock or Sagemaker secret carries an `aws_role_arn`, the long-lived keys sign a single `sts:AssumeRole` call and the short-lived session credentials it returns are what reach LiteLLM. SDK: - `_resolve_aws_role_arn` exchanges the role ARN for temporary STS credentials and composes with `_normalize_aws_provider_settings`, so resolution stays request-scoped and never mutates `os.environ` (builds on the Agenta-AI#4244 fix). The role ARN is consumed and never forwarded to LiteLLM as an unknown kwarg. - Promote `boto3` from the dev group to a runtime dependency; relock. Frontend: - Add an optional "Role ARN" field to the Configure Provider drawer for Bedrock and Sagemaker, wired `roleArn` <-> `aws_role_arn` through the `LlmProvider` type and the secret transforms. - Fix the drawer forcing `required: false` fields to required when a known provider is selected.
|
@Koushik-Salammagari is attempting to deploy a commit to the agenta projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughAdds AWS IAM role ARN support for Bedrock/SageMaker providers. The Python SDK replaces the env-var-based ChangesAWS Role ARN Support for Bedrock/SageMaker
Sequence Diagram(s)sequenceDiagram
participant handlers as handlers.py
participant resolver as _resolve_aws_role_arn
participant sts as boto3 STS
participant normalizer as _normalize_aws_provider_settings
participant litellm as mockllm.acompletion
handlers->>resolver: provider_settings (may include aws_role_arn)
alt aws_role_arn present
resolver->>sts: assume_role(RoleArn, RoleSessionName)
sts-->>resolver: temporary AccessKeyId / SecretAccessKey / SessionToken
resolver-->>handlers: settings with temporary credentials, role keys dropped
else no role ARN
resolver-->>handlers: settings unchanged
end
handlers->>normalizer: resolved settings
normalizer-->>handlers: canonical aws_access_key_id / aws_secret_access_key / aws_session_token / aws_region_name kwargs
handlers->>litellm: acompletion(**canonical_aws_kwargs)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdks/python/oss/tests/pytest/unit/test_resolve_aws_role_arn.py (1)
172-192: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winAdd a regression test for blank
aws_role_arn/AWS_ROLE_ARN.This suite misses the empty-string role ARN edge case, which is where alias leakage can occur. Pinning that behavior will prevent regressions.
💡 Proposed test
+def test_blank_role_arn_is_not_forwarded(): + settings = { + "aws_role_arn": "", + "aws_access_key_id": "BASE_KEY", + "aws_secret_access_key": "BASE_SECRET", + } + normalized = _normalize_aws_provider_settings(_resolve_aws_role_arn(settings)) + assert "aws_role_arn" not in normalized + assert "AWS_ROLE_ARN" not in normalized
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 1c456586-f2cd-44db-90cd-8aa422984d99
⛔ Files ignored due to path filters (3)
api/uv.lockis excluded by!**/*.locksdks/python/uv.lockis excluded by!**/*.lockservices/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
sdks/python/agenta/sdk/engines/running/handlers.pysdks/python/agenta/sdk/litellm/mockllm.pysdks/python/oss/tests/pytest/unit/test_auto_ai_critique_v0_runtime.pysdks/python/oss/tests/pytest/unit/test_chat_v0_inputs.pysdks/python/oss/tests/pytest/unit/test_prompt_template_extensions.pysdks/python/oss/tests/pytest/unit/test_resolve_aws_role_arn.pysdks/python/pyproject.tomlweb/oss/src/components/ModelRegistry/Drawers/ConfigureProviderDrawer/assets/ConfigureProviderDrawerContent.tsxweb/oss/src/components/ModelRegistry/Drawers/ConfigureProviderDrawer/assets/constants.tsweb/packages/agenta-entities/src/secret/core/transforms.tsweb/packages/agenta-shared/src/types/llmProvider.ts
💤 Files with no reviewable changes (3)
- sdks/python/oss/tests/pytest/unit/test_chat_v0_inputs.py
- sdks/python/oss/tests/pytest/unit/test_prompt_template_extensions.py
- sdks/python/agenta/sdk/litellm/mockllm.py
Address CodeRabbit review on Agenta-AI#4800: - Drop empty aws_role_arn/AWS_ROLE_ARN aliases before the early return so a blank UI field never leaks to LiteLLM as an unknown kwarg. - Resolve the role ARN via asyncio.to_thread at both async call sites so the blocking boto3 sts:AssumeRole call cannot stall the event loop. - Add regression tests for blank role-ARN values (both casings, '' and None).
Summary
Adds optional AWS IAM role assumption for Bedrock and Sagemaker providers. When a
provider secret carries an
aws_role_arn, the long-lived keys are used only to sign asingle
sts:AssumeRolecall, and the short-lived session credentials it returns are whatget forwarded to LiteLLM. This lets users connect Bedrock/Sagemaker with a role to assume
instead of (or on top of) static keys.
SDK (
sdks/python/agenta/sdk/engines/running/handlers.py)_resolve_aws_role_arn(...): if a role ARN is present (eitheraws_role_arnorAWS_ROLE_ARN), it builds an STS client from the base credentials, assumes the role,and replaces the credential set with the temporary session credentials. The role ARN is
consumed and never forwarded to LiteLLM as an unknown kwarg. Region resolves from the
usual aliases (
aws_region_name/aws_region/AWS_REGION/aws_default_region),defaulting to
us-east-1._normalize_aws_provider_settings, so resolution stays request-scopedand never mutates
os.environ. Bothauto_ai_critique_v0and_run_prompt_llm_config_with_retrycall sites are wired through it.boto3is promoted from the dev group to a runtime dependency (re-locked:sdks/python,api,services).Frontend
Sagemaker, wiring
roleArn⇄aws_role_arnthrough theLlmProvidertype and thesecret transforms.
required: falsewere forced required once aknown provider was selected.
Relationship to existing work
credentialsleakage between concurrent Bedrock calls #4244, the Bedrock credential leak). That PR removed theos.environmutation in favour of request-scopedaws_*kwargs; this feature plugs roleassumption into that design rather than the old env-mutation path. This branch is stacked
on [4244] fix(sdk): scope AWS credentials per request to stop Bedrock leak #4797 — once [4244] fix(sdk): scope AWS credentials per request to stop Bedrock leak #4797 merges, the diff here reduces to just the role-ARN changes.
one-week window in CONTRIBUTING.md). fix: Add optional ARN to AWS Bedrock connection to assume role when u… #4396 layered role assumption on top of the
env-mutation code that Fix
credentialsleakage between concurrent Bedrock calls #4244 is removing; this implementation avoids that and keepscredentials request-scoped. Thanks to @mcjraquel for surfacing the use case.
Testing
Verified locally
uv run pytest oss/tests/pytest/unit/— 628 passed (includes the 15 new tests below).ruff format+ruff check— clean on changed Python files.pnpm exec prettier --checkandeslint— clean on the four changed frontend files.tsc --noEmiton@agenta/oss— no new type errors introduced (diffed against thebase; the pre-existing test-infra errors are unchanged).
uv sync --locked --dry-run(uv 0.11.14, the CI version) — passes forsdks/python,api, andservices.Added tests
sdks/python/oss/tests/pytest/unit/test_resolve_aws_role_arn.py— 15 unit tests coveringthe no-op path, both ARN casings triggering
assume_role, base credentials signing the STSclient, region defaulting/resolution, temp credentials replacing static keys, the role ARN
being dropped, input dict immutability, a clear error when
boto3is missing, andcomposition with
_normalize_aws_provider_settings.Demo
Backend regression suite for the new resolver (
boto3STS mocked):QA follow-up
no
*), but I could not capture a live UI recording locally — please verify the fieldrenders and the form submits with and without a value during review.
call succeeds with the temporary credentials still needs QA (no live AWS access locally).
Checklist
Related: #4244, #4797, #4396