feat: sync AWS Transform agent-plugin from ATXKiroPower (post cli-3.3.0)#209
Conversation
There was a problem hiding this comment.
Semgrep OSS found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
d60ca81 to
b49f9fc
Compare
…3.0) Promote ATXKiroPower/mainline changes merged since the cli-3.3.0 tag: EC2/Batch execution rewrite (CFN + SSM, multi-worker, memory caps, prechecks), scheduling, security agent skill (admin/executor split), mainframe-reimagine workflow, remediation + custom-TD, source handling (GitHub/GitLab/Bitbucket), reporting, telemetry, and lint cleanup. Bump plugin version 1.1.2 -> 1.2.0 (minor; feature release). Public plugin.json schema preserved (author/homepage/keywords/license/repository).
b49f9fc to
0a15e41
Compare
Drop the mainframe reimagine skill files and the verify-traceability.py
script from the post cli-3.3.0 sync while AppSec questions on those
changes are resolved. This lets the continuous-modernization updates and
version bump ship without the mainframe content.
- Delete mainframe-reimagine{,-analysis,-ddd,-specgen,-verify}.md
- Delete skills/aws-transform/scripts/verify-traceability.py
- Revert mainframe.md to its main-branch state so no mainframe changes
remain in this release
scottschreckengaust
left a comment
There was a problem hiding this comment.
Approve — with nits
Reviewed the substantive part of this sync: the embedded CloudFormation/bash in continuous-modernization-ec2-execution.md (default WorkerCount 5→1, the new memory precheck, per-container --memory caps, --restart unless-stopped → on-failure:3, the rewritten health-check loop, the laptop-side/ssm_run memory math, and the new Secrets Manager credential-validation step), plus the skill-instruction reflows in source/remediation/schedule/batch-execution.
The new logic is carefully built. I verified the high-risk mechanics rather than assuming, and they hold up. The items below are hardening nits, not correctness regressions — hence approving.
✅ Verified correct
- Memory math
(MEM_TOTAL_MB - 4096) / WorkerCountwith a 2048 floor: the precheck (WorkerCount*2048 + 4096) guarantees the floor never over-commits the host;--memory-swap == --memorycorrectly disables overflow swap. !Subescaping: every${!bash_var}vs${CfnRef}is correct, including the deliberatewait "${!PIDS[@]}"(renders to literal${PIDS[@]}for bash).- Health-check terminal logic
dead OR (exited AND (ExitCode==0 OR RestartCount>=3))exactly matches its comment and correctly tolerates the transientexitedsnapshot betweenon-failure:3retries. WorkerCount=1default is consistent everywhere — CFN default, MinValue/MaxValue, provisioning script auto-sizing, decision tables, and prose. No stale references to the old default of 5.
⚠️ Nits — silent-failure hardening in the new code (non-blocking)
-
Precheck can fail open if
MemTotalreads empty.MEM_TOTAL_MB=$(( $(awk '/MemTotal/{print $2}' /proc/meminfo) / 1024 ))— ifawkyields nothing, the arithmetic errors butset -edoes not abort;[ "" -lt REQUIRED_MB ]is then false, so UserData prints "PASSED" and continues, and the empty value later poisons--memory="m". This defeats the precheck's stated purpose (fail in seconds). Suggest validatingMEM_TOTAL_MBis a positive integer andexit 1with a clear message otherwise. -
--restart on-failure:3leaves a container permanently down post-provision with no watchdog. Correct at provision time (a clean exit surfaces via the CreationPolicy timeout). But afterCREATE_COMPLETE, a runtime OOM-loop that exhausts 3 retries — or a clean exit — leaves the container dead silently; the next job'sdocker execjust fails. Consider having the Step 9 status path verifyState.Status=runningand message clearly ("container down — retries exhausted") rather than reporting a generic "no AID / still pending". -
Laptop-side
ssm_runmemory-math errors are swallowed. An empty remoteMEM_TOTAL_MB, orWORKER_COUNT=0, makes the remotedocker runfail — but the error lands only inStandardErrorContent, which thessm_runhelper never reads (it fetchesStandardOutputContentonly), so the flow proceeds as if the container launched. Note the${WORKER_COUNT:-1}fallback guards unset/empty but not a literal0. Suggest validating remotely and surfacingStandardErrorContent(and the commandStatus) on non-Success. -
New Step 7 credential snippet masks create failures.
create-secret … 2>/dev/null || put-secret-value …hides anAccessDenied/KMS failure and falls through toput-secret-value, which then fails with a misleadingResourceNotFoundException. The siblingdescribe-secret … 2>&1in the same step surfaces errors correctly — worth making these consistent (only fall back on the specific already-exists condition).
📝 Maintainability advisory
No comment contradicts its code. The one real drift risk is magic numbers duplicated in prose: 2048/4096 (three spots), on-failure:3 coupled to the health check's -ge 3, and the computed "300s" (actually seq 1 60 × sleep 5) stated as a literal. All correct today, but nothing keeps them in sync — consider hoisting shared constants into named vars or adding "keep in sync with X" pointers. Also worth confirming the troubleshooting table's port 8081 is accurate (it's unverifiable from the file alone).
🔭 Out of scope (not introduced by this PR)
Two items I initially surfaced trace to pre-existing code this PR doesn't touch (the remediation fan-out's --ids/--repo interaction, and the ssm_submit fire-and-forget launch not verifying its sentinel). Flagging for a possible separate follow-up — not blockers here.
Nice, defensive work on the memory-capping and health-check rewrite. 👍
Review assisted by Claude Code (pr-review-toolkit).
Promotes
ATXKiroPower/mainlinechanges merged since thecli-3.3.0tag.Themes: Continuous Modernization EC2/Batch execution rewrite (CFN + SSM, multi-worker, memory caps, prechecks), scheduling, security agent skill (admin/executor split), mainframe-reimagine workflow, remediation + custom-TD, source handling (GitHub/GitLab/Bitbucket), reporting, telemetry, and markdown lint/format cleanup.
Version bump: 1.1.2 → 1.2.0 (minor; feature release) across
.claude-plugin/plugin.json,.codex-plugin/plugin.json, and theaws-transformentry in.claude-plugin/marketplace.json.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.