Skip to content

feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite#1591

Open
NishankSiddharth wants to merge 5 commits into
mainfrom
feat/coded-apps-dashboards
Open

feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite#1591
NishankSiddharth wants to merge 5 commits into
mainfrom
feat/coded-apps-dashboards

Conversation

@NishankSiddharth

@NishankSiddharth NishankSiddharth commented Jun 19, 2026

Copy link
Copy Markdown

What this is

The uipath-coded-apps skill gains a natural-language dashboard generation capability. A UiPath admin / CoE lead describes the metrics they want in plain English — "agent health, error-rate trend for the week, top agents by usage" — and the skill generates a live, deployable React dashboard wired to their tenant's data through the @uipath/uipath-typescript SDK. No hand-coding.

How it works (end-to-end flow)

  1. Intent routing. On a dashboard request the skill classifies intent — build / edit / deploy — and fires a parallel "blast" (load the relevant docs, check for an existing dashboard via state.json, kick off a background uip login status, pre-warm dependencies) before showing anything.

  2. Plan — a text-only gate. It resolves every requested metric to a concrete SDK method, refuses impossible metrics inline (with a suggested alternative), and presents a plain-language plan. This turn is pure text, zero tool calls, ending with a confirm/change affordance. (A question/option tool fired in the same turn reliably suppresses the plan rendering, so the gate is deliberately text-only; setup questions like OAuth come after approval.)

  3. Compile. On approval the skill authors two things and compiles them:

    • intent.json — pure metadata (schemaVersion 2): dashboard name, time range, tenant + OAuth config, and a list of metric descriptors. No embedded code.
    • metrics/<name>.ts — one real TypeScript module per metric, each exporting fetchData(sdk, getToken) (plus optional drill-down fetchers). Each is written from curated SDK reference docs and cross-checked against the documented example response.

    The build engine then compiles these in two stages — an isolated typecheck of the metric modules, then generation of widgets, routes, and detail views, then a full tsc --noEmit backstop — into a Vite + React + Tailwind + Recharts app.

  4. Run / deploy. The app runs locally behind OAuth-PKCE login; a deploy flow packs, publishes, and deploys it to Automation Cloud as a coded web app, tracking version/routing state across deploys.

Why it's built this way

  • Compiler model — metric modules, not embedded code. Because each metric is a real .ts module, the generated data layer is type-checked end-to-end. The classic "compiles green but returns zero rows" failure is caught by a deterministic cross-check of each module against its documented SDK example response, and type errors are repaired in a bounded retry loop.
  • Pure-metadata intent.json. Keeping the spec declarative lets the engine own all code generation (widgets, routes, drill-downs). Edits, rebuilds, and version upgrades regenerate the disposable layer without clobbering authored logic.
  • Correctness via curated SDK docs, not a live probe. Every documented method ships a realistic example response; the author validates the fields and values it depends on against that example — no runtime, auth, or rate-limit cost.
  • Read-only by construction. Dashboards never mutate state; tier resolution forbids any write method from appearing in metric code.
  • Governance gating. Generic governance intent ("policy denials", "allow/deny verdicts") routes to the out-of-box Insights-API metrics; only explicit runtime-compliance / standard-clause / rule-violation intent opens the gate to the interim trace-derived metrics. The two families are kept strictly separate so neither regresses into the other.
  • Best-effort subagent. The build prefers to run inside a host sub-task (so the main thread stays clean) but falls back to an identical inline build when no sub-task mechanism exists — subagents are an optimization, never a hard requirement.

How it's tested

The coder-eval suite exercises the skill exactly as a user would invoke it: a coding agent loads the skill in a sandbox from a plain prompt, follows the workflow to produce real files, and deterministic criteria inspect the artifacts (intent.json, metric modules, generated widgets, .env.local contract, state.json) — never a self-written summary. A shared validator (_shared/check_dashboard.py) auto-locates the generated project (the skill scaffolds into a per-dashboard <routingName>/ subdirectory) and checks the full compiler-model structure; the e2e gold gate additionally runs tsc --noEmit over the generated app.

Tiers (per the repo taxonomy): smoke = fast PR-gate checks that need no full build (40-turn budget, every PR); integration + e2e = full inline builds, run nightly (200-turn budget). Modes (Coding-Agents scorecard): build (generate/edit), operate (deploy/run), diagnose (investigate/fix).

Test inventory (17 tasks)

Test Tier Mode What it verifies
dashboard_plan_gate smoke build The approval gate: agent presents a plan and writes no files / runs no npm ci before the user confirms.
dashboard_plan_before_question smoke build Plan response is pure text — no AskUserQuestion, no intent.json, no npm ci before approval; OAuth/setup questions fire only after.
dashboard_disambiguate smoke build An ambiguous prompt makes the agent ask a clarifying question and halt — no premature scaffold (package.json not created).
dashboard_deploy_smoke smoke operate Deploy command sequence pack → publish → deploy in order; -n is the friendly display name (not the routing slug); never -t Action (dashboards are web apps).
auth_no_manual_tokens smoke validate Coded-apps auth (non-dashboard): agent uses CLI-managed auth and refuses to pass/paste manual access tokens.
dashboard_scaffold integration build A 2-widget build produces the compiler-model shape: pure-metadata intent.json v2, metric modules exporting fetchData, generated widgets, and the 6-var OAuth-PKCE .env.local contract.
dashboard_full_e2e e2e build Full build of a 4-widget agent-health dashboard (KPI + area + line + table) with real SDK calls (getAll + getConsumptionTimeline + getErrors), named time constants, state.json, and a clean tsc --noEmit gold gate.
dashboard_agent_job_classification integration build Agent metrics derived from jobs use the raw OData field ProcessType eq 'Agent'; forbids sourceType and filtering on the SDK-mapped processName.
dashboard_refuse_invocation_timeline integration build An impossible "invocation volume over time" is substituted with the documented Jobs run-volume trend; forbids fabricating a non-existent getInvocation* SDK method.
dashboard_gov_gated_routing integration build Explicit runtime-compliance / standard-clause intent opens the governance gate → trace-derived metrics (parseGovernanceSpans + Traces.getById).
dashboard_gov_generic_no_regression integration build Generic "governance/policy" intent routes to Insights-API metrics (getPolicyTraces/getOperationSummary) and does not regress into the trace-derived family.
dashboard_rowlink_clickable integration build A clickable table wires a non-empty ROW_LINK_KEY + onRowClick handler and generates a *DetailView.tsx drill-down page.
dashboard_edit_change_widget integration build CHANGE op via the incremental editor: builds a widget, then changes its display type through an edit-intent.json CHANGE — regenerates in place, reuses the kit (no re-scaffold).
dashboard_edit_remove_widget integration build REMOVE op via the incremental editor: builds 2 widgets, removes 1; asserts exactly one metric module remains and the kit was reused.
dashboard_incremental_add e2e build ADD op (full edit lifecycle): builds a 1-widget dashboard, then adds a 2nd via edit-intent.json, reusing the already-extracted kit.
dashboard_diagnose_broken_metric integration diagnose Validate→read-error→fix loop: a seeded metric calls a non-existent SDK method; the agent reads the METRICS_RETRY/typecheck error and fixes the built module to the real getAll.
dashboard_deploy_state integration operate Fresh deploy path (systemName empty in state.json): resolves the folder and deploys without a version bump.

Tier split: 5 smoke · 10 integration · 2 e2e. Mode coverage: build, operate, and diagnose all represented.

Generated with Claude Code

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @NishankSiddharth's task in 10m 49s —— View job


PR Review: feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite

  • Read review criteria and project conventions
  • Analyze full PR diff (60 files, +8068 / −2222)
  • Review SKILL.md changes and frontmatter
  • Review new dashboard references and build scripts
  • Review SDK reference additions/changes
  • Review test suite (tasks, shared validator, activation lines)
  • Check CODEOWNERS, cross-skill refs, links, secrets
  • Post findings

Summary

Adds NLP-driven dashboard generation to the uipath-coded-apps skill — a compiler architecture where natural-language metric descriptions produce type-checked React dashboards backed by the @uipath/uipath-typescript SDK. The PR includes the full reference documentation tree (CAPABILITY.md, build/deploy plugins, tier resolution, SDK references for Agents/Traces/Governance), a 1900-line build script, a capability registry, and a 13-task coder-eval suite with a shared Python validator. Also refactors the action-app template for token optimization and removes the deleted Validation Station widget references.

Change-by-Change Review

1. skills/uipath-coded-apps/SKILL.md — frontmatter + body changes

Severity: Medium

Frontmatter is valid: name matches folder, description is 362 chars (under 1024 cap), front-loads identity ("UiPath Coded Web Apps, Action Apps…and NLP-driven dashboards"), includes redirects for siblings (→uipath-rpa, →uipath-agents, →uipath-maestro-flow). Task added to allowed-tools for the subagent mechanism. All good.

Issue — stale > **Preview** callout at line 9. Per .claude/rules/skill-structure.md § Lifecycle Status: "There is NO status marker in SKILL.md (frontmatter or body)… Keeping status in one machine-readable file lets agents and the generated README table report it consistently." The status is already correctly set to "preview" in assets/skill-status.json. The body callout at line 9 violates the single-source-of-truth rule and will be flagged by scripts/check-skill-status.py.
Fix this →

Issue — Task Navigation table missing new SDK references. The table at lines 72–90 links to CAPABILITY.md (line 90) but does not include the four new SDK references (agents.md, traces.md, governance.md, governance-traces.md). These are reachable through CAPABILITY.md → tier-resolution.md, but the Task Navigation table is the primary index for the app-building path. The dashboard path gets to them via CAPABILITY.md, but an agent working on an app that needs Insights RTM data (outside the dashboard workflow) would miss them.
Fix this →

Critical Rule 7 change: The old rule ("Never handle access tokens manually") was broader — it covered reading, printing, parsing, sourcing, and setting cached tokens. The new rule ("Never pass access tokens as CLI flags") is narrower and specifically about JWT length. The old wording protected against more failure modes (sourcing ~/.uipath/.auth, exporting tokens in shell). This is a judgment call — the new wording is more focused but loses some defensive breadth.

2. references/dashboards/CAPABILITY.md (177 lines) — dashboard entry point

Severity: OK

Well-structured routing document. Intent classification (BUILD/EDIT/DEPLOY) is clear. The parallel file-read blast is well-specified with conditional reads for domain-specific SDK docs. Hard stops section at the end is prescriptive and catches the key anti-patterns. The "never read build-dashboard.mjs" and "never read files one at a time" rules are good agent guardrails.

3. references/dashboards/plugins/build/impl.md (404 lines) — build plugin

Severity: OK

Thorough build specification. The phased approach (preflight → plan → approval → setup → cross-check → build) is well-documented. The plan format template with icon conventions is a strong LLM usability feature. Phase 3.5 cross-check against documented example responses is a smart correctness mechanism. The subagent spawn prompt is complete. Widget-type conventions with rendering descriptions give the agent enough context to produce accurate plans.

4. references/dashboards/plugins/deploy/impl.md (273 lines) — deploy plugin

Severity: OK

Solid deploy pipeline specification. The "no --version on deploy" and "no --path-name on upgrade" rules address the most common deploy failures. Version auto-bump retry logic is sound. The governance pinning choice uses the structured-choice pattern correctly.

5. references/dashboards/primitives/tier-resolution.md (451 lines) — metric classification

Severity: OK

Comprehensive metric catalog covering T0 (hard refuse), T1 (catalog with display hints), T2 (parametric), and T3 (custom). The SDK validation three-step check is prescriptive. The packageType vs sourceType canonical trap is well-documented with correct/incorrect code examples. All calling convention differences between services are documented in the footer table.

6. assets/scripts/build-dashboard.mjs (1907 lines) — build script

Severity: OK

No eval() calls, no secrets, no personal paths, no cross-skill imports. All execSync calls use hardcoded commands or resolve()-d paths with double-quoting. Implements the intent validation, scaffold extraction, env writing, metric typecheck, widget generation, and tsc backstop as documented.

7. assets/scripts/setup-admin-folder.mjs (144 lines) — folder provisioner

Severity: Medium (security)

Shell injection surface at line 28. The uip() helper uses execSync(uip ${args} --output json) with string interpolation. At lines 64 and 96, FOLDER_NAME (from process.argv[2]) is interpolated directly: uip(\or folders create "${FOLDER_NAME}"`). A malicious folder name could inject shell commands. Practical risk is low (the agent controls the argument), but execFileSync('uip', [...argArray, '--output', 'json'])` would close the vector.
Fix this →

8. assets/scripts/capability-registry.json (608 lines) — metric catalog

Severity: OK

Valid JSON with 30 T1 metrics, 5 T2 parametric metrics, and 7 hard-refuse entries. All metrics reference correct SDK services and subpaths. Descriptions include constructor patterns, method signatures, and response shapes. Aliases provide natural-language trigger matching.

9. assets/scripts/tests/resolution.test.mjs (1367 lines) — build script unit tests

Severity: OK

Thorough test suite using Node.js built-in node:test and node:assert/strict. Tests all major functions: intent validation, metric resolution, widget generation, edit classification, event parsing, column compilation. Includes regression guards that verify structural invariants in the source code.

10. SDK reference files (new: agents.md, traces.md, governance.md, governance-traces.md)

Severity: OK

All four new files follow kebab-case naming, have properly fenced code blocks with language identifiers, contain no cross-skill references, no secrets, and no filler prose. Each clearly documents calling conventions (positional Date args vs options objects), response shapes (bare arrays vs .items), and example responses. governance-traces.md properly documents the gated, interim nature of the trace-derived governance metrics.

11. SDK reference files (modified: orchestrator.md, maestro.md, action-center.md, feedback.md, etc.)

Severity: OK

Changes add dashboard-relevant information: job classification (packageType vs sourceType trap in orchestrator.md), Maestro Insights methods (maestro.md), and trim redundant content. feedback.md and action-center.md are trimmed for token optimization (removing methods not used by the dashboard path). conversational-agent.md drops the user-settings section (moved to the relevant scope reference). imports.md correctly adds the four new subpaths.

12. references/oauth-scopes.md — scope table updates

Severity: Low

Buckets scopes changed from OR.Buckets/OR.Buckets.Read to OR.Administration/OR.Administration.Read. Several Maestro Insights rows moved to a new "Maestro Insights — RTM" section. Validation Station widget scopes removed (widget was deleted). New sections added for Agents, Agent Traces, Agent Memory, and Governance scopes. Structurally clean.

Note: The Buckets scope change (lines 50–54) from OR.Buckets to OR.Administration is a functional change that affects non-dashboard apps using Buckets. Verify this reflects an actual SDK/API change rather than an error.

13. references/create-action-app.md — token optimization refactor (+196/−285)

Severity: OK

Significant token reduction through prose compression, removal of the DU Validation Station reference (widget was deleted), and consolidation of the action-schema workflow. CSS template files removed from assets/templates/action-app/ — these were embedded inline in the refactored template. The refactored version is more prescriptive and follows token-optimization rules.

14. Deleted files

Severity: OK

  • references/widgets/validation-station.md (308 lines) — removed with all references cleaned up. No dangling links.
  • scripts/validate-action-schema.js (241 lines) — removed. No references remain.
  • assets/templates/action-app/*.css (4 files, 701 lines) — CSS consolidated into the refactored action-app template.
  • tests/tasks/uipath-coded-apps/auth_no_manual_tokens.yaml (40 lines) — smoke test for the old "never handle tokens manually" rule. Deletion aligns with the Critical Rule 7 rewording.

15. Test suite — 13 dashboard tasks + activation lines + shared validator

Severity: OK

Tier coverage: 6 smoke, 6 integration, 1 e2e — meets the requirement of ≥1 smoke + ≥1 e2e. All 13 tasks have required tags (uipath-coded-apps as first tag, tier, mode:*), valid task_id patterns, and at least one behavioral success criterion. No @uipath/cli in env_packages. Prompts are minimal.

Activation lines: 12 new entries (IDs 051–062) in uipath-coded-apps.jsonl covering dashboard-related natural-language prompts. All correctly set expected_skill: uipath-coded-apps.

Shared validator (_shared/check_dashboard.py, 242 lines): Well-structured Python script that validates project structure, OAuth env vars, intent.json purity (no fnBody), metric modules, widget generation, rowLink wiring, and optionally runs tsc --noEmit.

Minor: dashboard_plan_gate.yaml has only 1 criterion (command_executed checking npm ci was NOT run) and lacks the skill_triggered check that 11 of 13 tasks include. This is the thinnest test.

16. .gitattributes + binary .zip fixture

Severity: Medium

The .github/code_review.md states "No binary files or images committed (use text formats)." This PR adds governance-dashboard-starter-kit.zip (66KB) as a binary fixture. The .gitattributes correctly marks *.zip binary. The build architecture requires the zip (it's the React scaffold extracted at build time), and the PR description explains it's maintained externally and copied in. The alternative (checking in the scaffold as source) would add 100+ files. This is a pragmatic tradeoff, but it does conflict with the stated guideline.


What's Missing

  • Task Navigation table gaps: The four new SDK references (agents.md, traces.md, governance.md, governance-traces.md) are reachable through the dashboard path (CAPABILITY.md) but missing from SKILL.md's primary index table, limiting discoverability for the app-building path.
  • > **Preview** callout in SKILL.md body: Must be removed — lifecycle status belongs only in assets/skill-status.json per repo rules.
  • CODEOWNERS for the new dashboard/ test subdirectory: The existing CODEOWNERS entry at /tests/tasks/uipath-coded-apps/ covers it via glob, so this is structurally fine. No action needed.

Area Ratings

Area Status Notes
Frontmatter OK Valid YAML, 362 chars, proper redirects, Task tool added
E2E Tests OK 6 smoke + 6 integration + 1 e2e; all pass structural checks; shared validator is solid
Skill Body Issue > **Preview** callout violates single-source-of-truth rule; new SDK refs missing from Task Navigation
References & Assets OK Comprehensive, self-contained, no cross-skill refs, no broken links, good token optimization
Repo Hygiene Issue Shell injection surface in setup-admin-folder.mjs; binary .zip committed (pragmatic but conflicts with guideline)

Issues for Manual Review

  • Buckets scope change (oauth-scopes.md): The Buckets service scopes changed from OR.Buckets/OR.Buckets.Read to OR.Administration/OR.Administration.Read. This affects all non-dashboard apps using Buckets. Verify this reflects an actual SDK/API change.
  • Critical Rule 7 narrowing: The old rule ("Never handle access tokens manually" — covered sourcing, printing, parsing, setting cached tokens) was broader than the new rule ("Never pass access tokens as CLI flags" — JWT length concern). The deleted auth_no_manual_tokens.yaml test validated the broader rule. Confirm the narrower rule is intentional.
  • SDK method existence: The new Insights RTM methods (Agents.getAll, AgentTraces.getErrorsTimeline, AgentMemory.getTimeline, Governance.getPolicyTraces, etc.) and Maestro Insights methods (Cases.getTopRunCount, MaestroProcesses.getInstanceStatusTimeline, CaseInstances.getSlaSummary, etc.) — verify these exist in the installed @uipath/uipath-typescript version (≥ 1.4.1 per the references).

Conclusion

This is a well-architected feature addition with strong documentation discipline and thorough test coverage. The compiler model (intent.json + metric modules → type-checked dashboard) is a solid approach, and the reference documentation is prescriptive and LLM-friendly.

Two items should be fixed before merge:

  1. Remove the > **Preview** callout from SKILL.md line 9 (violates the lifecycle status single-source-of-truth rule and will fail CI validation).
  2. Fix the shell injection surface in setup-admin-folder.mjs — replace execSync string interpolation with execFileSync argument arrays.

One item is recommended (Medium):
3. Add the four new SDK references to the SKILL.md Task Navigation table.

Everything else (test structure, SDK references, build script, capability registry, dashboard reference tree) is solid.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @NishankSiddharth's task in 4m 14s —— View job


Coder-eval task lint (advisory)

13 task YAMLs changed; verdicts: 0 Critical, 0 High, 1 Medium, 2 Low, 10 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

High — PR body describes the test design and coverage approach but does not claim the changed tasks have been run and passed. Please edit the PR description to add a line like: Ran all 13 dashboard tasks locally and they passed.

Per-task lint

tests/tasks/uipath-coded-apps/dashboard/build/dashboard_full_e2e.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/deploy/dashboard_deploy_smoke.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/deploy/dashboard_deploy_state.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/deploy/dashboard_deploy_upgrade.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/detail/dashboard_rowlink_clickable.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/governance/dashboard_gov_gated_routing.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/governance/dashboard_gov_generic_no_regression.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/routing/dashboard_agent_job_classification.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/smoke/dashboard_disambiguate.yaml — verdict: Low (theme-captured; see Theme 2)

tests/tasks/uipath-coded-apps/dashboard/smoke/dashboard_metric_modules.yaml — verdict: Low (theme-captured; see Theme 1)

tests/tasks/uipath-coded-apps/dashboard/smoke/dashboard_plan_before_question.yaml — verdict: OK

tests/tasks/uipath-coded-apps/dashboard/smoke/dashboard_plan_gate.yaml — verdict: Medium

Issues:

  • [Medium] Meaningful coverage (line 20–27): single negative-only criterion (npm ci not run, weight 3.0). No skill_triggered or positive artifact check — the test cannot distinguish "agent showed a plan and waited" from "agent did nothing."

Suggested fixes:

  • Add skill_triggered for uipath-coded-apps (weight 1.0) so the test confirms the skill was at least invoked before the gate held, consistent with dashboard_plan_before_question.yaml which already has this.

tests/tasks/uipath-coded-apps/dashboard/smoke/dashboard_scaffold.yaml — verdict: OK

Within-PR duplicates

  • [Medium] Cluster 1: dashboard_scaffold.yaml, dashboard_metric_modules.yaml — both build a 2-widget dashboard and assert the metric-module architecture (no fnBody, metrics/ folder, schemaVersion 2). dashboard_scaffold uses check_dashboard.py which already validates all of this plus env vars, widget counts, and fetchData exports. dashboard_metric_modules adds marginal signal (explicit schemaVersion grep) but is largely subsumed. Consider dropping dashboard_metric_modules or merging its unique check into dashboard_scaffold's check_dashboard.py invocation via --require-substr schemaVersion. Fix this →
  • [Medium] Cluster 2: dashboard_plan_gate.yaml, dashboard_plan_before_question.yaml — both test "agent presents plan and does not build before approval." dashboard_plan_before_question has 4 criteria (skill_triggered, no AskUserQuestion, no npm ci, no intent.json) while dashboard_plan_gate has only 1 (no npm ci). The former strictly subsumes the latter's assertion. The prompt difference (simple vs complex CoE request) provides mild novelty. Consider adding skill_triggered to dashboard_plan_gate to differentiate it as the "minimal gate" variant, or dropping it in favor of dashboard_plan_before_question. Fix this →

Themes

  1. Near-duplicate of more-comprehensive sibling (Medium): dashboard_metric_modules (vs dashboard_scaffold) and dashboard_plan_gate (vs dashboard_plan_before_question) each overlap substantially with a sibling that runs strictly more assertions. Neither adds enough marginal coverage to justify the infra cost independently. Fix: consolidate or differentiate with unique positive criteria.
  2. Negative-only criteria with no positive anchor (Medium): dashboard_disambiguate and dashboard_plan_gate have zero positive criteria (no skill_triggered, no artifact check). Agent inaction, crash, or timeout satisfies all assertions. Fix: add at least skill_triggered where the agent should invoke the skill, or an llm_judge transcript check for disambiguation behavior where it shouldn't. (dashboard_plan_before_question already avoids this by including skill_triggered.)

CLI verb reachability

⚠ Could not run scripts/check-cli-verbs.py due to sandbox permissions. The deploy tasks reference uip codedapp pack|publish|deploy and uip or folders list — these appear to be valid catalog verbs but were not machine-verified.

Conclusion

⚠ 3 task(s) have issues, max severity Medium. Advisory only — not blocking merge. The test suite overall is well-designed: the shared check_dashboard.py validator provides strong structural and content assertions, the deploy tests cover materially distinct paths (fresh/upgrade/smoke), and the governance pair (gate-open/gate-closed) is a clean coverage pattern. The main suggestions are: (1) add a passing-run claim to the PR body, (2) add skill_triggered to dashboard_plan_gate and dashboard_disambiguate as positive anchors, and (3) consider consolidating dashboard_metric_modules into dashboard_scaffold.


@NishankSiddharth NishankSiddharth force-pushed the feat/coded-apps-dashboards branch 2 times, most recently from 03f1607 to b938bcc Compare June 19, 2026 16:20
@NishankSiddharth

Copy link
Copy Markdown
Author

@claude please review the PR again earlier files diff was not correct.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Claude finished @NishankSiddharth's task in 8m 3s —— View job


PR Review: feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite

  • Read review criteria and project conventions
  • Gather full PR diff (main...HEAD) — 49 files, +7693/−120
  • Review SKILL.md changes and frontmatter
  • Review new dashboard references and build scripts
  • Review SDK reference additions/changes
  • Review test suite (tasks, shared validator, activation lines)
  • Check CODEOWNERS, cross-skill refs, links, secrets
  • Post findings

Summary

Adds NLP-driven dashboard generation to uipath-coded-apps — a compiler architecture where natural-language metric descriptions produce type-checked React dashboards via the @uipath/uipath-typescript SDK (Insights RTM). The PR includes the full reference doc tree (CAPABILITY.md, build/deploy plugins, tier resolution, SDK references for Agents/Traces/Governance), a 1907-line build script, a capability registry, and a 13-task coder-eval suite with a shared Python validator. Also refactors uipath-rpa to remove uia-elements-interaction-guide.md (inlining key pitfalls into ui-automation-guide.md) and trims a blockquote from uipath-solution.


Change-by-Change Review

1. skills/uipath-coded-apps/SKILL.md — frontmatter + body

Severity: Medium

Frontmatter is valid: name matches folder, description is 362 chars (under 1024 cap), front-loads identity with file signals (app.config.json, action-schema.json), includes redirects for uipath-rpa, uipath-agents, uipath-maestro-flow. Task added to allowed-tools for the subagent mechanism. No > **Preview** callout (previous review's finding is resolved). Status correctly set to "preview" in assets/skill-status.json.

Issue — Task Navigation table missing new SDK references. Four new SDK references (agents.md, traces.md, governance.md, governance-traces.md) are reachable via CAPABILITY.md → tier-resolution.md, but absent from the primary Task Navigation table (lines 78–91). An agent building an app (not a dashboard) that needs Insights RTM data would miss them.
Fix this →

Critical Rule 7 narrowing — old rule ("Never handle access tokens manually") covered sourcing, printing, parsing, and setting cached tokens. New rule ("Never pass access tokens as CLI flags") focuses specifically on JWT length. The broader protection is lost. This is a judgment call — flagging for manual review.

2. references/dashboards/CAPABILITY.md (177 lines)

Severity: OK

Well-structured routing document. Intent classification (BUILD/EDIT/DEPLOY) is clear with a three-row table. The parallel file-read blast with conditional reads for domain-specific SDK docs is well-specified. Hard stops section (17 "never" rules) is prescriptive and catches the key anti-patterns. The "never read build-dashboard.mjs" and "never read files one at a time" rules are good agent guardrails.

3. references/dashboards/plugins/build/impl.md (404 lines)

Severity: OK

Thorough build specification. The phased approach (preflight → plan → approval → setup → cross-check → build) is well-documented. The plan format template with icon conventions and rendering descriptions is a strong LLM usability feature. Phase 3.5 cross-check against documented example responses is a smart correctness mechanism. The subagent spawn prompt is complete.

4. references/dashboards/plugins/deploy/impl.md (273 lines)

Severity: OK

Solid deploy pipeline specification. The "no --version on deploy" and "no --path-name on upgrade" rules address the most common deploy failures. Version auto-bump retry logic is sound.

5. references/dashboards/primitives/tier-resolution.md (451 lines)

Severity: OK

Comprehensive metric catalog covering T0 (hard refuse), T1 (catalog with display hints), T2 (parametric), and T3 (custom). The packageType vs sourceType canonical trap is well-documented with correct/incorrect code examples. All calling convention differences between services are documented.

6. assets/scripts/build-dashboard.mjs (1907 lines)

Severity: OK

No eval() calls, no secrets, no personal paths, no cross-skill imports. All execSync calls use hardcoded commands or resolve()-d paths. Implements the intent validation, scaffold extraction, env writing, metric typecheck, widget generation, and tsc backstop as documented.

7. assets/scripts/setup-admin-folder.mjs (144 lines)

Severity: Medium (security)

Shell injection surface at line 28. The uip() helper uses execSync(uip ${args} --output json) with string interpolation. At line 64 and line 96, FOLDER_NAME (from process.argv[2]) is interpolated: uip(\or folders create "${FOLDER_NAME}"`). A folder name like "; rm -rf /; echo "would inject shell commands. Practical risk is low (the agent controls the argument), butexecFileSync('uip', [...argArray, '--output', 'json'])` would close the vector entirely.
Fix this →

8. assets/scripts/capability-registry.json (608 lines)

Severity: OK

Valid JSON with T1 metrics, T2 parametric metrics, and hard-refuse entries. All metrics reference correct SDK services and subpaths. Aliases provide natural-language trigger matching.

9. assets/scripts/tests/resolution.test.mjs (1367 lines)

Severity: OK

Thorough test suite using Node.js built-in node:test and node:assert/strict. Tests cover intent validation, metric resolution, widget generation, edit classification, event parsing, column compilation. Includes regression guards on source code structure.

10. New SDK references (agents.md, traces.md, governance.md, governance-traces.md)

Severity: OK

All four follow kebab-case naming, have properly fenced code blocks with language identifiers, contain no cross-skill references, no secrets. Each documents calling conventions (positional Date args vs options objects), response shapes (bare arrays vs .items), and example responses. governance-traces.md properly documents the gated/interim nature.

11. Modified SDK references (orchestrator.md, maestro.md, imports.md, pagination.md)

Severity: OK

orchestrator.md adds the critical packageType vs sourceType job classification section with example response, field mappings, and filterable vs read-only field warnings. This is a high-quality addition that prevents a common agent mistake. maestro.md adds Maestro Insights RTM methods with module patterns. imports.md adds the four new subpaths. pagination.md adds a dashboard helper tip.

12. references/oauth-scopes.md — scope table updates

Severity: OK

New sections for Agents, Agent Traces, Agent Memory, Governance, and Maestro Insights RTM scopes. Common Scope Bundles table updated. Clean structure.

13. references/pack-publish-deploy.md — deploy flag guidance

Severity: OK

Minor but useful: adds a warning about omitting -v on deploy to avoid the "not published yet" race with catalog indexing.

14. Changes to skills/uipath-rpa/ — deletion of uia-elements-interaction-guide.md

Severity: OK

uia-elements-interaction-guide.md (93 lines) is deleted. All references to it in ui-automation-guide.md and SKILL.md have been cleanly removed — grep confirms zero remaining references. The key pitfall (web dropdown native vs custom) is simplified to one line in ui-automation-guide.md. The broader date-input and disabled-button sections are dropped.

The Task Navigation table row for "Drive a captured control" (referencing the deleted guide) is removed. The Delay pitfall loses its cross-reference to the deleted guide's DelayBefore/DelayAfter guidance.

Note for manual review: The deleted content about date-input format detection and disabled-button DelayBefore was substantive agent guidance. The simplification in ui-automation-guide.md is much terser ("use TypeInto instead"). If agents still encounter these control types, the detail may be missed.

15. skills/uipath-solution/references/solution-overview.md

Severity: OK

Removes one > blockquote about .uis bundles being zip archives. Minimal change, no link breakage.

16. Test suite — 13 dashboard tasks + activation lines + shared validator

Severity: Low

Tier coverage: 6 smoke + 6 integration + 1 e2e — meets ≥1 smoke + ≥1 e2e. All 13 tasks have required tags (uipath-coded-apps as first tag, tier, mode:*), valid task_id patterns, and behavioral success criteria. No @uipath/cli in env_packages. Prompts are minimal.

Activation lines: 12 new entries (IDs 051–062) covering dashboard-related prompts. All correctly set expected_skill: uipath-coded-apps.

Shared validator (_shared/check_dashboard.py, 242 lines): Well-structured. Validates project structure, OAuth env vars, intent.json purity (no fnBody), metric modules, widget generation, rowLink wiring, detail views, and optional tsc --noEmit.

Minor issue — dashboard_plan_gate.yaml thin coverage. Only 1 criterion (command_executed checking npm ci was NOT run) with no positive anchor. No skill_triggered check. Agent inaction or crash satisfies the single negative assertion. dashboard_plan_before_question.yaml already subsumes its assertion with 4 criteria.

Minor issue — dashboard_disambiguate.yaml negative-only criteria. Both criteria are negative (no npm ci, no package.json). Agent inaction satisfies all assertions.

17. .gitattributes + binary .zip fixture

Severity: Low

.gitattributes correctly marks *.zip binary. The governance-dashboard-starter-kit.zip (67KB) is the React scaffold extracted at build time. .github/code_review.md states "No binary files or images committed" — this is a pragmatic tradeoff vs checking in 100+ scaffold source files. The .version file alongside it provides a version tracking mechanism.


What's Missing

  • Task Navigation table gaps (Medium): The four new SDK references (agents.md, traces.md, governance.md, governance-traces.md) are reachable through the dashboard path but missing from SKILL.md's primary index table, limiting discoverability for the non-dashboard app-building path.
  • Positive anchors on 2 smoke tests (Low): dashboard_plan_gate.yaml and dashboard_disambiguate.yaml lack a skill_triggered or other positive criterion — agent inaction passes all their assertions.

Area Ratings

Area Status Notes
Frontmatter OK Valid YAML, 362 chars, proper redirects, Task tool added, no stale status markers
E2E Tests OK 6 smoke + 6 integration + 1 e2e; shared validator is solid; 2 thin tests noted
Skill Body Issue New SDK refs missing from Task Navigation table
References & Assets OK Comprehensive, self-contained, no cross-skill refs, no broken links, good token optimization
Repo Hygiene Issue Shell injection surface in setup-admin-folder.mjs; binary .zip committed (pragmatic tradeoff)

Issues for Manual Review

  • Critical Rule 7 narrowing: The old rule ("Never handle access tokens manually") was broader than the new rule ("Never pass access tokens as CLI flags"). Confirm the narrower scope is intentional.
  • SDK method existence: The new Insights RTM methods (Agents.getAll, AgentTraces.getErrorsTimeline, AgentMemory.getTimeline, Governance.getPolicyTraces, Maestro Insights methods) — verify these exist in the installed @uipath/uipath-typescript version (≥ 1.4.1 per the references).
  • uipath-rpa content reduction: The deleted uia-elements-interaction-guide.md contained date-input format detection and disabled-button DelayBefore guidance. The simplified one-liners in ui-automation-guide.md may not give agents enough detail for those control types.

Conclusion

Well-architected feature addition with strong documentation discipline and thorough test coverage. The compiler model (intent.json + metric modules → type-checked dashboard) is solid, and the reference documentation is prescriptive and LLM-friendly.

Two items should be fixed before merge:

  1. Fix the shell injection surface in setup-admin-folder.mjs — replace execSync string interpolation with execFileSync argument arrays.
  2. Add the four new SDK references to the SKILL.md Task Navigation table.

One item is recommended (Low):
3. Add skill_triggered to dashboard_plan_gate.yaml and dashboard_disambiguate.yaml as positive anchors.

Everything else (test structure, SDK references, build script, capability registry, dashboard reference tree, activation lines) is solid.
· Branch

@NishankSiddharth NishankSiddharth force-pushed the feat/coded-apps-dashboards branch 11 times, most recently from 7a76d24 to da12230 Compare June 20, 2026 14:58
@NishankSiddharth NishankSiddharth force-pushed the feat/coded-apps-dashboards branch 7 times, most recently from f35ff7b to 6b87b85 Compare June 22, 2026 09:35
@NishankSiddharth

NishankSiddharth commented Jun 22, 2026

Copy link
Copy Markdown
Author

@NishankSiddharth NishankSiddharth force-pushed the feat/coded-apps-dashboards branch 4 times, most recently from 5a0dbad to d06087e Compare June 22, 2026 17:26
NishankSiddharth and others added 2 commits June 23, 2026 00:21
…lates, eject, plugin config

Add the dashboard generation surface to the uipath-coded-apps skill: a
natural-language → React/TypeScript dashboard pipeline built on a compiler
model.

- Compiler architecture: intent.json is pure metadata (schemaVersion 2, no
  fnBody); each metric is a real TS module under src/metrics/ exporting
  fetchData; widgets/views are generated under src/dashboard/. build-dashboard.mjs
  drives scaffold extraction, two-stage tsc, and the Vite/React/Tailwind/Recharts
  app.
- Incremental editing: state.json (schemaVersion 2) carries versions + regime +
  widgets + deployment; edit ops (ADD/REMOVE/CHANGE/REBUILD/UPGRADE) regenerate
  from persisted metadata + on-disk modules.
- Templates + eject: regime marker (compiler-managed | ejected); EJECT flips
  one-way and the edit-script then refuses (EJECTED_PROJECT); --pack-template
  stages a tenant-neutral modify-face + template.json into dist/_source for a
  single reusable artifact.
- SDK config via the @uipath/coded-apps-dev Vite plugin: the build writes
  uipath.json (full tenant config for normal builds, scope-only for templates);
  uipathCodedApps() injects <meta name="uipath:*"> tags into index.html and the
  SDK (new UiPath()) reads its config from those tags — no VITE_*/.env.local.
  packTemplate blanks all tenant identity since dist/_source is web-served.
- Auth: AuthProvider is the single OAuth driver (host-injected token when
  embedded, OAuth-PKCE locally); no logout-on-cleanup, no competing
  sdk.initialize() — fixes the StrictMode redirect loop.
- --prewarm resolves its project dir with resolve() (not join()), honoring
  absolute paths.
- Single-zip starter-kit fixture (v2.9.0), version read from inside the zip.

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017ycqjFGHJp8imtuSdUocgs
…cture

Add the dashboard test suite: smoke / integration / e2e tasks plus a shared
check_dashboard.py validator.

- check_dashboard.py asserts the compiler-model shape (pure-metadata intent.json
  at schemaVersion 2, metric modules exporting fetchData, generated widgets) and
  the SDK config contract (uipath.json keys + uipathCodedApps() wired in
  vite.config.ts); auto-locates the project under its <routingName> subdir.
- Tasks cover scaffold/build, incremental edit (change/remove), detail views,
  diagnose (broken metric), governance gating + no-regression, refuse-invocation,
  and live e2e, side-effect graded throughout.
- Per-task node_modules prune keeps preserved artifacts small.

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017ycqjFGHJp8imtuSdUocgs
@NishankSiddharth NishankSiddharth force-pushed the feat/coded-apps-dashboards branch from d06087e to b91bf75 Compare June 22, 2026 18:52
…out, KPI drill-down, fetchDetail enforcement

Chart drill-downs silently fell back to the chart's own aggregate buckets unless
the author hand-wired detail, and KPI drill-downs forced an eject. Now:

- Build hard-fails with CHART_DETAIL_MISSING when a widget that exposes a
  drill-down omits `export const fetchDetail` — the chart analogue of the
  existing T3-table-without-columns gate. Single decision point:
  widgetGetsDetailView().
- Charts default to a record-grain drill-down. 9 detail-capable charts gain a
  detailRecipe (the exact fetchDetail SDK call) + default detailColumns. 8
  timeline/aggregate charts opt out with "noDetail": true (registry entry for
  T1/T2; the metric in intent.json for a T3 custom chart — no registry entry).
- KPI cards drill down without ejecting. Cataloged KPIs with a feasible record
  query DEFAULT to a drill-down (active-agents, success-rate, agent-units,
  governance-violations) via registry defaults.detail + detailRecipe +
  detailColumns; a T3 KPI opts in with detail:true; either is suppressible with
  detail:false. buildViewSpec runs the fetchDetail export and inherits registry
  detailColumns.
- Plan template names the drill-down ("click → the individual faulted runs"),
  sourced from detailRecipe; noDetail charts omit it.
- Refreshed starter-kit fixture (2.13.0): runtime-guarded chart cards, KPI-clickable
  t3-shell, shared @/lib/governance-scan. Docs updated (tier-resolution, detail-views,
  layout-patterns Rule 10, impl plan, customization ejected-metrics note,
  governance-traces uses the shared scanners). +11 unit tests (158 pass).

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SxzF5eiKdpCXiEVVghuuut
@NishankSiddharth NishankSiddharth force-pushed the feat/coded-apps-dashboards branch from b678700 to 3461c72 Compare June 23, 2026 10:38
NishankSiddharth and others added 2 commits June 23, 2026 18:48
…om dashboard task prompts

A coder-eval prompt-quality review flagged the dashboard suite for insider framing
that weakens the eval. Prompt-text only — every success_criteria, description, and
comment is unchanged, so grading is identical and the tests stay deterministic:

- Remove "Before starting, load the uipath-coded-apps skill …" from all 16 prompts.
  Routing to the right skill is exactly what the skill_triggered criterion tests;
  naming the skill pre-satisfied it. Removal makes routing genuinely under test.
- Strip "author intent.json and the metric modules" from the 7 build prompts — the
  agent should derive those artifact shapes from the skill, not be told. The
  build-to-completion / no-question control lines stay.
- diagnose: "fix the metric module" -> "fix it" (state the symptom, not the noun).

Leaves auth_no_manual_tokens untouched (pre-existing, not part of this suite).

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SxzF5eiKdpCXiEVVghuuut
…raming from edit/incremental prompts

Follow-up to the prompt-quality review (took the dashboard suite from 16 Medium to
13 Low; these 3 were the last Medium). The phrase "Treat this as an incremental
edit of the already-built dashboard, not a rebuild from scratch" telegraphs the
internal incremental-editor path (edit-intent.json) the checker grades — whether
the agent edits in place vs re-scaffolds is exactly what's under test, so it must
not be handed over.

- edit_change: "…change that widget … Treat this as an incremental edit … not a
  rebuild from scratch." -> "Then change that same widget to a bar chart."
- edit_remove: same softening -> "Then remove the faulted jobs widget."
- incremental_add: -> "Then add a second widget — faulted jobs — to it."
- Drop "(the build, then the incremental edit)" from the shared control block.

Prompt-text only; success_criteria (edit-intent.json present, module counts) and
the non-interactive control lines are unchanged, so grading stays deterministic.

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SxzF5eiKdpCXiEVVghuuut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant