feat(dsql): Add system diagnostics workflow (Workflow 12)#207
Open
Morlej wants to merge 4 commits into
Open
Conversation
Add CloudWatch AAS-based system diagnostics to the DSQL skill. Uses PromQL queries against db.active_sessions.avg to detect temporal anomalies in wait event distribution and identify regressed queries, then routes to Workflow 9 (Query Plan Explainability) for per-query investigation. OTel attribute names use the proposed naming convention: - db.wait.event, db.wait.class, db.session.state - db.query.id, db.query.normalized_text - aws.auroradsql.session.role.arn, application.name New files: - references/system-diagnostics/workflow.md — 6 diagnostic sub-workflows - references/system-diagnostics/wait-events.md — canonical wait event reference - references/system-diagnostics/promql-patterns.md — reusable PromQL templates Also: - Adds cloudwatch MCP server to .mcp.json (disabled by default) - Bumps plugin version to 1.5.0
Add a decision table before Common Workflows that routes performance complaints to Workflow 12 (System Diagnostics) instead of allowing them to fall through to Workflow 9 (Query Plan Explainability) directly. Rule: when in doubt, start with Workflow 12 — it identifies specific queries and routes to Workflow 9 with context.
- Use correct get_promql_label_values syntax with match parameter - Add note that calls without match filter return empty - Add PromQL syntax rules: quote labels with dots/@, use __name__ selector - Add explicit discovery step to Workflow 1 - Fix promql-patterns.md to show actual tool parameter names (label_name, match)
Replace separate numbered workflows (1-6) with a single diagnostic procedure of 5 mandatory phases. The agent MUST execute ALL phases before presenting results — no stopping at the first finding. Phases: 1. Discovery and Baseline Comparison (distribution shifts) 2. Top-SQL Regression Detection (new/growing queries) 3. Workload Attribution (application/role changes) 4. Commit and OCC Analysis (volume vs conflicts) 5. Inflection Point Detection (when did it change) Adds 'Presenting Results' section mandating a unified report across all dimensions before handoff to Workflow 9.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds CloudWatch AAS-based system diagnostics to the DSQL skill as Workflow 12.
What it does
Uses PromQL queries against
db.active_sessions.avgOTel metrics to run a mandatory full diagnostic sweep across 5 phases:Routes identified queries to Workflow 9 (Query Plan Explainability) for per-query investigation.
OTel attribute naming
Uses the proposed naming convention:
db.wait.event,db.wait.class,db.session.statedb.query.id,db.query.normalized_textaws.auroradsql.session.role.arn,application.nameKey design decisions
matchparameter (documented as critical rule)Files
references/system-diagnostics/workflow.md— 5-phase diagnostic procedurereferences/system-diagnostics/wait-events.md— canonical DSQL wait event referencereferences/system-diagnostics/promql-patterns.md— reusable PromQL templates.mcp.json— adds cloudwatch MCP server (disabled by default)SKILL.md— updated description, tags, reference table, Workflow 12, Performance Routing tableBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.