feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite#1591
feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite#1591NishankSiddharth wants to merge 5 commits into
Conversation
|
Claude finished @NishankSiddharth's task in 10m 49s —— View job PR Review: feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite
SummaryAdds NLP-driven dashboard generation to the Change-by-Change Review1.
|
| Area | Status | Notes |
|---|---|---|
| Frontmatter | OK | Valid YAML, 362 chars, proper → redirects, Task tool added |
| E2E Tests | OK | 6 smoke + 6 integration + 1 e2e; all pass structural checks; shared validator is solid |
| Skill Body | Issue | > **Preview** callout violates single-source-of-truth rule; new SDK refs missing from Task Navigation |
| References & Assets | OK | Comprehensive, self-contained, no cross-skill refs, no broken links, good token optimization |
| Repo Hygiene | Issue | Shell injection surface in setup-admin-folder.mjs; binary .zip committed (pragmatic but conflicts with guideline) |
Issues for Manual Review
- Buckets scope change (oauth-scopes.md): The Buckets service scopes changed from
OR.Buckets/OR.Buckets.ReadtoOR.Administration/OR.Administration.Read. This affects all non-dashboard apps using Buckets. Verify this reflects an actual SDK/API change. - Critical Rule 7 narrowing: The old rule ("Never handle access tokens manually" — covered sourcing, printing, parsing, setting cached tokens) was broader than the new rule ("Never pass access tokens as CLI flags" — JWT length concern). The deleted
auth_no_manual_tokens.yamltest validated the broader rule. Confirm the narrower rule is intentional. - SDK method existence: The new Insights RTM methods (
Agents.getAll,AgentTraces.getErrorsTimeline,AgentMemory.getTimeline,Governance.getPolicyTraces, etc.) and Maestro Insights methods (Cases.getTopRunCount,MaestroProcesses.getInstanceStatusTimeline,CaseInstances.getSlaSummary, etc.) — verify these exist in the installed@uipath/uipath-typescriptversion (≥ 1.4.1 per the references).
Conclusion
This is a well-architected feature addition with strong documentation discipline and thorough test coverage. The compiler model (intent.json + metric modules → type-checked dashboard) is a solid approach, and the reference documentation is prescriptive and LLM-friendly.
Two items should be fixed before merge:
- Remove the
> **Preview**callout from SKILL.md line 9 (violates the lifecycle status single-source-of-truth rule and will fail CI validation). - Fix the shell injection surface in
setup-admin-folder.mjs— replaceexecSyncstring interpolation withexecFileSyncargument arrays.
One item is recommended (Medium):
3. Add the four new SDK references to the SKILL.md Task Navigation table.
Everything else (test structure, SDK references, build script, capability registry, dashboard reference tree) is solid.
|
Claude finished @NishankSiddharth's task in 4m 14s —— View job Coder-eval task lint (advisory)13 task YAMLs changed; verdicts: 0 Critical, 0 High, 1 Medium, 2 Low, 10 OK. Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge. Evidence of passing run❌ High — PR body describes the test design and coverage approach but does not claim the changed tasks have been run and passed. Please edit the PR description to add a line like: Per-task lint
|
03f1607 to
b938bcc
Compare
|
@claude please review the PR again earlier files diff was not correct. |
|
Claude finished @NishankSiddharth's task in 8m 3s —— View job PR Review: feat(coded-apps): NLP-driven dashboard generation (compiler architecture) + coder-eval suite
SummaryAdds NLP-driven dashboard generation to Change-by-Change Review1.
|
| Area | Status | Notes |
|---|---|---|
| Frontmatter | OK | Valid YAML, 362 chars, proper → redirects, Task tool added, no stale status markers |
| E2E Tests | OK | 6 smoke + 6 integration + 1 e2e; shared validator is solid; 2 thin tests noted |
| Skill Body | Issue | New SDK refs missing from Task Navigation table |
| References & Assets | OK | Comprehensive, self-contained, no cross-skill refs, no broken links, good token optimization |
| Repo Hygiene | Issue | Shell injection surface in setup-admin-folder.mjs; binary .zip committed (pragmatic tradeoff) |
Issues for Manual Review
- Critical Rule 7 narrowing: The old rule ("Never handle access tokens manually") was broader than the new rule ("Never pass access tokens as CLI flags"). Confirm the narrower scope is intentional.
- SDK method existence: The new Insights RTM methods (
Agents.getAll,AgentTraces.getErrorsTimeline,AgentMemory.getTimeline,Governance.getPolicyTraces, Maestro Insights methods) — verify these exist in the installed@uipath/uipath-typescriptversion (≥ 1.4.1 per the references). - uipath-rpa content reduction: The deleted
uia-elements-interaction-guide.mdcontained date-input format detection and disabled-buttonDelayBeforeguidance. The simplified one-liners inui-automation-guide.mdmay not give agents enough detail for those control types.
Conclusion
Well-architected feature addition with strong documentation discipline and thorough test coverage. The compiler model (intent.json + metric modules → type-checked dashboard) is solid, and the reference documentation is prescriptive and LLM-friendly.
Two items should be fixed before merge:
- Fix the shell injection surface in
setup-admin-folder.mjs— replaceexecSyncstring interpolation withexecFileSyncargument arrays. - Add the four new SDK references to the SKILL.md Task Navigation table.
One item is recommended (Low):
3. Add skill_triggered to dashboard_plan_gate.yaml and dashboard_disambiguate.yaml as positive anchors.
Everything else (test structure, SDK references, build script, capability registry, dashboard reference tree, activation lines) is solid.
· Branch
7a76d24 to
da12230
Compare
f35ff7b to
6b87b85
Compare
5a0dbad to
d06087e
Compare
…lates, eject, plugin config Add the dashboard generation surface to the uipath-coded-apps skill: a natural-language → React/TypeScript dashboard pipeline built on a compiler model. - Compiler architecture: intent.json is pure metadata (schemaVersion 2, no fnBody); each metric is a real TS module under src/metrics/ exporting fetchData; widgets/views are generated under src/dashboard/. build-dashboard.mjs drives scaffold extraction, two-stage tsc, and the Vite/React/Tailwind/Recharts app. - Incremental editing: state.json (schemaVersion 2) carries versions + regime + widgets + deployment; edit ops (ADD/REMOVE/CHANGE/REBUILD/UPGRADE) regenerate from persisted metadata + on-disk modules. - Templates + eject: regime marker (compiler-managed | ejected); EJECT flips one-way and the edit-script then refuses (EJECTED_PROJECT); --pack-template stages a tenant-neutral modify-face + template.json into dist/_source for a single reusable artifact. - SDK config via the @uipath/coded-apps-dev Vite plugin: the build writes uipath.json (full tenant config for normal builds, scope-only for templates); uipathCodedApps() injects <meta name="uipath:*"> tags into index.html and the SDK (new UiPath()) reads its config from those tags — no VITE_*/.env.local. packTemplate blanks all tenant identity since dist/_source is web-served. - Auth: AuthProvider is the single OAuth driver (host-injected token when embedded, OAuth-PKCE locally); no logout-on-cleanup, no competing sdk.initialize() — fixes the StrictMode redirect loop. - --prewarm resolves its project dir with resolve() (not join()), honoring absolute paths. - Single-zip starter-kit fixture (v2.9.0), version read from inside the zip. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017ycqjFGHJp8imtuSdUocgs
…cture Add the dashboard test suite: smoke / integration / e2e tasks plus a shared check_dashboard.py validator. - check_dashboard.py asserts the compiler-model shape (pure-metadata intent.json at schemaVersion 2, metric modules exporting fetchData, generated widgets) and the SDK config contract (uipath.json keys + uipathCodedApps() wired in vite.config.ts); auto-locates the project under its <routingName> subdir. - Tasks cover scaffold/build, incremental edit (change/remove), detail views, diagnose (broken metric), governance gating + no-regression, refuse-invocation, and live e2e, side-effect graded throughout. - Per-task node_modules prune keeps preserved artifacts small. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017ycqjFGHJp8imtuSdUocgs
d06087e to
b91bf75
Compare
…out, KPI drill-down, fetchDetail enforcement
Chart drill-downs silently fell back to the chart's own aggregate buckets unless
the author hand-wired detail, and KPI drill-downs forced an eject. Now:
- Build hard-fails with CHART_DETAIL_MISSING when a widget that exposes a
drill-down omits `export const fetchDetail` — the chart analogue of the
existing T3-table-without-columns gate. Single decision point:
widgetGetsDetailView().
- Charts default to a record-grain drill-down. 9 detail-capable charts gain a
detailRecipe (the exact fetchDetail SDK call) + default detailColumns. 8
timeline/aggregate charts opt out with "noDetail": true (registry entry for
T1/T2; the metric in intent.json for a T3 custom chart — no registry entry).
- KPI cards drill down without ejecting. Cataloged KPIs with a feasible record
query DEFAULT to a drill-down (active-agents, success-rate, agent-units,
governance-violations) via registry defaults.detail + detailRecipe +
detailColumns; a T3 KPI opts in with detail:true; either is suppressible with
detail:false. buildViewSpec runs the fetchDetail export and inherits registry
detailColumns.
- Plan template names the drill-down ("click → the individual faulted runs"),
sourced from detailRecipe; noDetail charts omit it.
- Refreshed starter-kit fixture (2.13.0): runtime-guarded chart cards, KPI-clickable
t3-shell, shared @/lib/governance-scan. Docs updated (tier-resolution, detail-views,
layout-patterns Rule 10, impl plan, customization ejected-metrics note,
governance-traces uses the shared scanners). +11 unit tests (158 pass).
Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SxzF5eiKdpCXiEVVghuuut
b678700 to
3461c72
Compare
…om dashboard task prompts A coder-eval prompt-quality review flagged the dashboard suite for insider framing that weakens the eval. Prompt-text only — every success_criteria, description, and comment is unchanged, so grading is identical and the tests stay deterministic: - Remove "Before starting, load the uipath-coded-apps skill …" from all 16 prompts. Routing to the right skill is exactly what the skill_triggered criterion tests; naming the skill pre-satisfied it. Removal makes routing genuinely under test. - Strip "author intent.json and the metric modules" from the 7 build prompts — the agent should derive those artifact shapes from the skill, not be told. The build-to-completion / no-question control lines stay. - diagnose: "fix the metric module" -> "fix it" (state the symptom, not the noun). Leaves auth_no_manual_tokens untouched (pre-existing, not part of this suite). Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SxzF5eiKdpCXiEVVghuuut
…raming from edit/incremental prompts Follow-up to the prompt-quality review (took the dashboard suite from 16 Medium to 13 Low; these 3 were the last Medium). The phrase "Treat this as an incremental edit of the already-built dashboard, not a rebuild from scratch" telegraphs the internal incremental-editor path (edit-intent.json) the checker grades — whether the agent edits in place vs re-scaffolds is exactly what's under test, so it must not be handed over. - edit_change: "…change that widget … Treat this as an incremental edit … not a rebuild from scratch." -> "Then change that same widget to a bar chart." - edit_remove: same softening -> "Then remove the faulted jobs widget." - incremental_add: -> "Then add a second widget — faulted jobs — to it." - Drop "(the build, then the incremental edit)" from the shared control block. Prompt-text only; success_criteria (edit-intent.json present, module counts) and the non-interactive control lines are unchanged, so grading stays deterministic. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SxzF5eiKdpCXiEVVghuuut

What this is
The
uipath-coded-appsskill gains a natural-language dashboard generation capability. A UiPath admin / CoE lead describes the metrics they want in plain English — "agent health, error-rate trend for the week, top agents by usage" — and the skill generates a live, deployable React dashboard wired to their tenant's data through the@uipath/uipath-typescriptSDK. No hand-coding.How it works (end-to-end flow)
Intent routing. On a dashboard request the skill classifies intent — build / edit / deploy — and fires a parallel "blast" (load the relevant docs, check for an existing dashboard via
state.json, kick off a backgrounduip login status, pre-warm dependencies) before showing anything.Plan — a text-only gate. It resolves every requested metric to a concrete SDK method, refuses impossible metrics inline (with a suggested alternative), and presents a plain-language plan. This turn is pure text, zero tool calls, ending with a confirm/change affordance. (A question/option tool fired in the same turn reliably suppresses the plan rendering, so the gate is deliberately text-only; setup questions like OAuth come after approval.)
Compile. On approval the skill authors two things and compiles them:
intent.json— pure metadata (schemaVersion 2): dashboard name, time range, tenant + OAuth config, and a list of metric descriptors. No embedded code.metrics/<name>.ts— one real TypeScript module per metric, each exportingfetchData(sdk, getToken)(plus optional drill-down fetchers). Each is written from curated SDK reference docs and cross-checked against the documented example response.The build engine then compiles these in two stages — an isolated typecheck of the metric modules, then generation of widgets, routes, and detail views, then a full
tsc --noEmitbackstop — into a Vite + React + Tailwind + Recharts app.Run / deploy. The app runs locally behind OAuth-PKCE login; a deploy flow packs, publishes, and deploys it to Automation Cloud as a coded web app, tracking version/routing state across deploys.
Why it's built this way
.tsmodule, the generated data layer is type-checked end-to-end. The classic "compiles green but returns zero rows" failure is caught by a deterministic cross-check of each module against its documented SDK example response, and type errors are repaired in a bounded retry loop.intent.json. Keeping the spec declarative lets the engine own all code generation (widgets, routes, drill-downs). Edits, rebuilds, and version upgrades regenerate the disposable layer without clobbering authored logic.How it's tested
The coder-eval suite exercises the skill exactly as a user would invoke it: a coding agent loads the skill in a sandbox from a plain prompt, follows the workflow to produce real files, and deterministic criteria inspect the artifacts (
intent.json, metric modules, generated widgets,.env.localcontract,state.json) — never a self-written summary. A shared validator (_shared/check_dashboard.py) auto-locates the generated project (the skill scaffolds into a per-dashboard<routingName>/subdirectory) and checks the full compiler-model structure; the e2e gold gate additionally runstsc --noEmitover the generated app.Tiers (per the repo taxonomy): smoke = fast PR-gate checks that need no full build (40-turn budget, every PR); integration + e2e = full inline builds, run nightly (200-turn budget). Modes (Coding-Agents scorecard): build (generate/edit), operate (deploy/run), diagnose (investigate/fix).
Test inventory (17 tasks)
dashboard_plan_gatenpm cibefore the user confirms.dashboard_plan_before_questionAskUserQuestion, nointent.json, nonpm cibefore approval; OAuth/setup questions fire only after.dashboard_disambiguatepackage.jsonnot created).dashboard_deploy_smoke-nis the friendly display name (not the routing slug); never-t Action(dashboards are web apps).auth_no_manual_tokensdashboard_scaffoldintent.jsonv2, metric modules exportingfetchData, generated widgets, and the 6-var OAuth-PKCE.env.localcontract.dashboard_full_e2egetAll+getConsumptionTimeline+getErrors), named time constants,state.json, and a cleantsc --noEmitgold gate.dashboard_agent_job_classificationProcessType eq 'Agent'; forbidssourceTypeand filtering on the SDK-mappedprocessName.dashboard_refuse_invocation_timelinegetInvocation*SDK method.dashboard_gov_gated_routingparseGovernanceSpans+Traces.getById).dashboard_gov_generic_no_regressiongetPolicyTraces/getOperationSummary) and does not regress into the trace-derived family.dashboard_rowlink_clickableROW_LINK_KEY+onRowClickhandler and generates a*DetailView.tsxdrill-down page.dashboard_edit_change_widgetedit-intent.jsonCHANGE — regenerates in place, reuses the kit (no re-scaffold).dashboard_edit_remove_widgetdashboard_incremental_addedit-intent.json, reusing the already-extracted kit.dashboard_diagnose_broken_metricMETRICS_RETRY/typecheck error and fixes the built module to the realgetAll.dashboard_deploy_statesystemNameempty instate.json): resolves the folder and deploys without a version bump.Tier split: 5 smoke · 10 integration · 2 e2e. Mode coverage: build, operate, and diagnose all represented.
Generated with Claude Code