Skip to content

feat(uipath-governance): path to green — operate tests, diagnose structure, description verbs#1651

Merged
t-hsia merged 5 commits into
mainfrom
feat/governance-path-to-green
Jun 23, 2026
Merged

feat(uipath-governance): path to green — operate tests, diagnose structure, description verbs#1651
t-hsia merged 5 commits into
mainfrom
feat/governance-path-to-green

Conversation

@t-hsia

@t-hsia t-hsia commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Addresses coverage gaps for Governance (uip gov) on the Coding Agents Scorecard, following the same pattern as the admin path-to-green (#1578).

Current scores: Build 50%, Operate 30%, Diagnose 15%

Changes

SKILL.md updates:

  • Added Operate: and Diagnose: verb phrases to description (856 chars, under 1024 limit)
  • New ### Troubleshoot section in "When to Use" with 6 trigger patterns
  • 3 new Task Navigation rows for diagnose references

references/diagnose/ directory (3 files, follows maestro-bpmn/admin pattern):

  • CAPABILITY.md — structured capability index for diagnose mode (8 triggers, 6 rules, 8 common tasks)
  • references/failure-modes.md — 6 named failure patterns: policy not taking effect, wrong policy applied, access policy too broad/narrow, deployed policy empty, policy create rejected
  • references/troubleshooting-guide.md — 5-step diagnostic priority ladder

New mode:operate smoke tests (3 files — previously zero!):

  • aops_deploy_tenant_smoke — deploy policy to tenant
  • aops_deployed_policy_query_smoke — query effective deployed policy
  • access_evaluate_smoke — evaluate access policy rules

New mode:diagnose smoke tests (2 files):

  • aops_diagnose_policy_not_applied_smoke — investigate policy not taking effect
  • access_diagnose_blocked_invocation_smoke — diagnose blocked tool invocation

Test distribution after changes: 17 total — 5 build, 3 operate, 9 diagnose

Scorecard impact (expected)

Metric Before After (expected)
Operate tests 0 3
Diagnose tests 7 9
Operate Product Coverage 30% ~45% (deploy, query deployed, evaluate)
Diagnose Product Coverage 15% ~35% (diagnose/ structure + failure modes)

Not in scope

  • Source Control, CI/CD Pipelines, Feed Management — no uip CLI surface exists
  • These product capabilities create a structural ceiling regardless of skill maturity

Test plan

  • All 17 governance test YAMLs parse as valid YAML
  • All tests have uipath-governance skill tag + mode:* tag
  • New smoke tests validate correct CLI command shapes
  • hooks/validate-skill-descriptions.sh passes
  • diagnose/ reference links resolve to existing files

🤖 Generated with Claude Code

…green

Add operate and diagnose verbs to SKILL.md description, a Troubleshoot
subsection under "When to Use", diagnose reference navigation rows, and
the full references/diagnose/ directory (CAPABILITY.md, failure-modes.md,
troubleshooting-guide.md). Add 3 mode:operate smoke tests (deploy tenant,
query deployed policy, evaluate access policy) and 2 mode:diagnose smoke
tests (policy not applied, blocked invocation). Targets Build/Operate/
Diagnose scorecard improvement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @t-hsia's task in 3m 10s —— View job


PR Review: feat(uipath-governance): path to green

  • Read review criteria and project conventions
  • Read full diff and all changed files
  • Validate skill structure and SKILL.md changes
  • Validate diagnose reference files
  • Validate test task YAMLs
  • Check links, CODEOWNERS, cross-references
  • Post final review

Summary

Adds operate and diagnose coverage for the governance skill: 3 new operate smoke tests, 2 new diagnose smoke tests, a references/diagnose/ directory with capability index + failure modes + troubleshooting guide, description verb additions, and a new Troubleshoot section in SKILL.md. Well-structured PR that follows the established path-to-green pattern.

Change-by-Change Review

1. skills/uipath-governance/SKILL.md — description update + Troubleshoot section

Severity: OK

Description updated from ~690 chars to ~833 chars — well under the 1024 limit. The Operate: and Diagnose: verb phrases are compact and front-load key actions. The new ### Troubleshoot section under "When to Use" adds 6 clear trigger patterns that cover the diagnostic surface. Three new Reference Navigation rows point to the new diagnose files with correct relative paths (from SKILL.md's location in the skill root, ./references/diagnose/... resolves correctly).

2. skills/uipath-governance/references/diagnose/CAPABILITY.md — capability index

Severity: OK

Clean capability index: 8 "When to use" triggers, 6 critical rules, 8 common tasks with deep-links, strong anti-patterns section. Cross-capability links (../aops-policy/..., ../access-policy/..., ../disambiguation-guide.md) all resolve correctly from references/diagnose/. The ../../SKILL.md back-link is correct. Follows the pattern established by other capability indexes in the repo.

3. skills/uipath-governance/references/diagnose/references/failure-modes.md — 6 failure patterns

Severity: High — broken cross-capability links

The failure mode content is excellent — 6 well-structured patterns with symptom/cause/investigation/fix. Every CLI command includes --output json. However, all 6 cross-capability links are broken. The file is at references/diagnose/references/failure-modes.md (two levels below references/), but the links use ../ (one level up = diagnose/) instead of ../../ (two levels up = references/).

Broken links (all need ../../../):

  • Line 65: ../aops-policy/aops-policy-deploy-guide.md → resolves to diagnose/aops-policy/... (doesn't exist)
  • Line 102: ../access-policy/plugins/actor/impl.md → resolves to diagnose/access-policy/... (doesn't exist)
  • Line 135: ../access-policy/plugins/tags/planning.md → same issue
  • Line 135: ../access-policy/plugins/selector/impl.md → same issue
  • Line 167: ../aops-policy/aops-policy-deploy-guide.md → same issue
  • Line 200: ../aops-policy/configure-aops-policy-data-guide.md → same issue

Fix this →

4. skills/uipath-governance/references/diagnose/references/troubleshooting-guide.md — diagnostic ladder

Severity: OK

5-step sequential triage workflow. Cross-references at lines 19 and 100 correctly use ../../ (two levels up). The step-by-step structure with interpret/verify sections is exactly what an agent needs. Clean and complete.

5. tests/tasks/uipath-governance/aops-policy/aops_deploy_tenant_smoke.yaml

Severity: Low — complex regex, otherwise good

Tags correct (uipath-governance first, smoke, mode:operate). Prompt is goal-oriented with clear auth-error guidance. The command_pattern on line 43 uses multiple lookaheads ((?=.*--tenant-name\s)(?=.*--input\s)(?=.*--output\s+json)) which is correct but fragile — if the agent puts flags before the tenant ID, the .*00000000-0000-0000-0000-000000000099.* prefix won't match. This is acceptable for a smoke test since the natural ordering puts the positional arg first, but worth noting.

6. tests/tasks/uipath-governance/aops-policy/aops_deployed_policy_query_smoke.yaml

Severity: OK

Clean smoke test. command_pattern validates the three positional args in order (NoLicense.*AITrustLayer.*00000000-...) plus --output json. Single primary criterion with correct weight (2.5). Tags correct.

7. tests/tasks/uipath-governance/aops-policy/aops_diagnose_policy_not_applied_smoke.yaml

Severity: OK

Good diagnostic smoke test. Checks two commands: deployed-policy get|list (weight 2.5) and deployment group get|list (weight 2.0). The | alternation in patterns is a nice touch — allows flexibility in how the agent approaches diagnosis. Tags correctly include mode:diagnose.

8. tests/tasks/uipath-governance/access-policy/access_evaluate_smoke.yaml

Severity: OK

Validates access-policy evaluate with both --resource-type and --actor-process-type. Weights well-distributed (2.5 primary, 1.5 supporting, 1.0 file). Tags correct.

9. tests/tasks/uipath-governance/access-policy/access_diagnose_blocked_invocation_smoke.yaml

Severity: OK

Validates the diagnose workflow: evaluate + list. Good that it tests two commands (evaluate for the symptom, list for candidate policies). Tags correct.

What's Missing

  • No sandbox: or agent: overrides in any test YAML — correct; all inherit from tests/experiments/default.yaml.
  • No env_packages — correct per test-writing rules (CLI installed globally by runner).
  • Missing passing-run claims — the test-writing rules say to "run the task with coder-eval and add a passing-run claim to the PR." None of the 5 new tasks have passing-run claims. This is flagged by lint as High, though the PR description notes this is about scorecard coverage and the tests validate command shapes (which can't pass without a live tenant). Medium — add passing-run claims or note why they're deferred.

Area Ratings

Area Status Notes
Frontmatter OK 833 chars, under 1024 limit. Operate/Diagnose verbs added cleanly.
E2E Tests OK Pre-existing e2e tests cover build lifecycle. New tests are all smoke (correct for operate/diagnose command-shape validation).
Skill Body OK Troubleshoot section well-placed. Reference Navigation updated.
References & Assets Issue 6 broken relative links in failure-modes.md (../ should be ../../).
Repo Hygiene OK CODEOWNERS covers both /skills/uipath-governance/ and /tests/tasks/uipath-governance/. No secrets. Changes scoped to governance skill.

Issues for Manual Review

  • CLI command accuracy: The deployment tenant configure verb (in aops_deploy_tenant_smoke.yaml) and the --tenant-name / --input flags should be verified against the actual CLI. The reviewer should confirm these flags exist on the current @uipath/cli release.
  • access-policy evaluate flag names: Verify --resource-type, --resource-id, --actor-process-type, --actor-process-id are the actual flag names (not --resourceType etc.).
  • Failure mode fix references: Lines 102 and 135 of failure-modes.md reference plugins/actor/impl.md, plugins/tags/planning.md, and plugins/selector/impl.md — these exist in the repo but the content should be verified to contain the referenced anchors (#deny-to-allow-flip, Robot intent section).

Conclusion

Solid path-to-green PR that follows established patterns well. One High issue blocks merge: the 6 broken relative links in failure-modes.md where ../ needs to be ../../. This is a straightforward fix (all instances are the same off-by-one in directory depth). Everything else — SKILL.md changes, CAPABILITY.md, troubleshooting guide, and all 5 test YAMLs — looks clean and well-structured.

Recommendation: Request changes — fix the broken links in failure-modes.md, then this is ready to merge.


@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @t-hsia's task in 3m 42s —— View job


Coder-eval task lint (advisory)

5 task YAMLs changed; per-task verdicts: 0 Critical, 0 High, 1 Medium, 0 Low, 4 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

High — PR body does not claim the changed tasks have been run and passed. The test plan section contains unchecked - [ ] items describing what to verify, but no assertion that any task was actually executed. Please edit the PR description to add a line like:

Ran <task-id> locally and it passed.

Per-task lint

tests/tasks/uipath-governance/access-policy/access_diagnose_blocked_invocation_smoke.yaml — verdict: OK

No issues. Goal-oriented prompt ("diagnose why an agent is being blocked") without prescribing access-policy evaluate or access-policy list. Two distinct command_executed criteria validate a multi-step diagnostic workflow.

tests/tasks/uipath-governance/access-policy/access_evaluate_smoke.yaml — verdict: OK

No issues. Single-command operate test with two complementary command_executed checks (--resource-type + --actor-process-type). Prompt states the goal without naming the CLI subcommand.

tests/tasks/uipath-governance/aops-policy/aops_deploy_tenant_smoke.yaml — verdict: OK

No issues. Primary criterion uses lookaheads to verify tenant ID, --tenant-name, --input, and --output json in a single pattern — strong shape validation for a smoke test.

tests/tasks/uipath-governance/aops-policy/aops_deployed_policy_query_smoke.yaml — verdict: Medium

Issues:

  • [Medium] Near-duplicate: existing deployed_policy_smoke.yaml (same folder) already tests deployed-policy get with three positional args + --output json. This task uses different fixture values (NoLicense/AITrustLayer vs Attended/StudioX) and tags mode:operate instead of mode:diagnose, but the CLI operation under test is structurally identical. The existing task additionally validates deployed-policy list, making it a superset.

Suggested fixes:

  • Consider whether the mode:operate tag alone justifies a separate task. If yes, consider strengthening this task by adding a criterion the existing diagnose task doesn't cover (e.g., checking that the agent does NOT also run deployed-policy list — since an operate query should be targeted, not exploratory). Otherwise, consider adding mode:operate as a second tag to the existing deployed_policy_smoke.yaml and dropping this file.

tests/tasks/uipath-governance/aops-policy/aops_diagnose_policy_not_applied_smoke.yaml — verdict: OK

No issues. Tests a two-step diagnostic workflow (deployed-policy get|list + deployment group get|list) that is materially distinct from all existing siblings. Good use of alternation in command_pattern to accept both get and list.

Within-PR duplicates

  • [Medium] Cluster: access_evaluate_smoke.yaml + access_diagnose_blocked_invocation_smoke.yaml — both share an identical primary command_executed criterion (uip gov access-policy evaluate ... --resource-type ... --output json, weight 2.5, same regex). Each adds marginal coverage: the operate task checks --actor-process-type, the diagnose task checks access-policy list. Not interchangeable, but an agent that passes one is likely to pass the shared criterion of the other. Consider differentiating the evaluate regex (e.g., the diagnose task could drop the --resource-type assertion on evaluate and rely on it only for the list step).

Conclusion

⚠ 1 task has a per-task issue (max severity Medium), plus 1 within-PR cluster (Medium) and a missing passing-run claim (High). Advisory only — not blocking merge. The biggest action item is adding a passing-run claim to the PR description.


Note: CLI verb reachability checks (scripts/check-cli-verbs.py) could not be run due to sandbox permissions. Manually inspected all command_pattern values — all reference uip gov access-policy or uip gov aops-policy subcommands consistent with the existing test suite.

t-hsia and others added 2 commits June 23, 2026 12:04
…prompt consistency in path-to-green PR

- failure-modes.md: fix `template get "<TEMPLATE_ID>"` placeholder to
  `"<PRODUCT_NAME>"` matching the actual CLI positional argument
- aops_deployed_policy_query_smoke: change lifecycle:setup to
  lifecycle:discover (querying deployed policy is read-only)
- access_evaluate_smoke: change lifecycle:setup to lifecycle:discover
  (evaluate is a PDP query, not a mutation)
- aops_diagnose_policy_not_applied_smoke: fix prompt saying "Studio
  policy" when the scenario uses product AITrustLayer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ID requirement

Agent may not place the tenant GUID as a positional arg before the
flags. Relax regex to check for `deployment tenant configure` with
--input and --output json, without requiring exact GUID position.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

**Symptom:** Admin deployed a policy to a tenant (or group), but a specific user still sees the old behavior or no policy at all.

**Causes:**

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add scenario where user has multiple group policy and the one with lower priority wins instead of intended one.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is to lower (make it more important) the priority of the new policy.

Compare the assignments at each level for the target product:
- If a user override exists (including explicit `null`), it wins over group and tenant.
- If a group override exists, it wins over tenant for members of that group.
- A `null` override means explicit "No Policy" — it is NOT the same as "no assignment" (which would inherit from the next level).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I would add above this line to also compare group policies priorities where lower priority wins.

@grant-tannert grant-tannert left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with comments

1. Selector rule too broad — matches resources beyond the intended scope
2. Actor rule excludes the calling user/group
3. Executable rule blocks the actor process type
4. Tags on the resource trigger a deny-intent policy

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: "Tags on the resource or executable trigger a deny-intent policy"

t-hsia and others added 2 commits June 23, 2026 15:55
…back

Addresses grant-tannert's review comments:
- failure-modes.md: add cause #3 for multi-group scenario where lower
  priority number wins; update fix guidance
- troubleshooting-guide.md: add group priority comparison step to
  precedence checking (Step 3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@t-hsia t-hsia merged commit dff6585 into main Jun 23, 2026
14 checks passed
@t-hsia t-hsia deleted the feat/governance-path-to-green branch June 23, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants