diff --git a/plugins/databases-on-aws/skills/dsql/SKILL.md b/plugins/databases-on-aws/skills/dsql/SKILL.md index a0a79d7c..80c3e25f 100644 --- a/plugins/databases-on-aws/skills/dsql/SKILL.md +++ b/plugins/databases-on-aws/skills/dsql/SKILL.md @@ -1,6 +1,6 @@ --- name: dsql -description: "Build with Aurora DSQL — manage schemas, execute queries, handle migrations, diagnose query plans, and develop applications with a serverless, distributed SQL database. Covers IAM auth, multi-tenant patterns, MySQL-to-DSQL migration, DDL operations, query plan explainability, and SQL compatibility validation. Triggers on phrases like: DSQL, Aurora DSQL, create DSQL table, DSQL schema, migrate to DSQL, distributed SQL database, serverless PostgreSQL-compatible database, DSQL query plan, DSQL EXPLAIN ANALYZE, why is my DSQL query slow." +description: "Build with Aurora DSQL — manage schemas, execute queries, handle migrations, diagnose query plans, and develop applications with a serverless, distributed SQL database. Covers IAM auth, multi-tenant patterns, MySQL-to-DSQL migration, DDL operations, query plan explainability, and SQL compatibility validation. Triggers on phrases like: DSQL, Aurora DSQL, create DSQL table, DSQL schema, migrate to DSQL, distributed SQL database, serverless PostgreSQL-compatible database, DSQL query plan, DSQL EXPLAIN ANALYZE, why is my DSQL query slow, DSQL query performance, DSQL full scan, DSQL DPU, DSQL query cost, DSQL latency, optimize this query, this query is slow, explain this plan, query performance, high DPU, make this faster, why is this doing a full scan." license: Apache-2.0 metadata: tags: aws, aurora, dsql, distributed-sql, distributed, distributed-database, database, serverless, serverless-database, postgresql, postgres, sql, schema, migration, multi-tenant, iam-auth, aurora-dsql, mcp, orm @@ -35,7 +35,7 @@ Load these files as needed for detailed guidance: **When:** Always load for guidance using or updating the DSQL MCP server **Contains:** Instructions for setting up the DSQL MCP server with 2 configuration options as -sampled in [.mcp.json](../../.mcp.json) +sampled in [mcp/.mcp.json](mcp/.mcp.json) 1. Documentation-Tools Only 2. Database Operations (requires a cluster endpoint) @@ -111,8 +111,10 @@ sampled in [.mcp.json](../../.mcp.json) ### Query Plan Explainability (modular): -**When:** MUST load all four at Workflow 8 Phase 0 — [query-plan/plan-interpretation.md](references/query-plan/plan-interpretation.md), [query-plan/catalog-queries.md](references/query-plan/catalog-queries.md), [query-plan/guc-experiments.md](references/query-plan/guc-experiments.md), [query-plan/report-format.md](references/query-plan/report-format.md) -**Contains:** DSQL node types + Node Duration math + estimation-error bands, pg_class/pg_stats/pg_indexes SQL + correlated-predicate verification, GUC experiment procedures + 30-second skip protocol, required report structure + element checklist + support request template +#### [query-plan/workflow.md](references/query-plan/workflow.md) + +**When:** MUST load at Workflow 8 entry — it gates all other query-plan files +**Contains:** Trigger criteria, context disambiguation, routing, phased workflow (Phase 0–4). Workflow.md specifies which reference files to load at each phase — follow its loading instructions rather than loading all files upfront ### SQL Compatibility Validation: @@ -164,16 +166,17 @@ defaults that may change — when a user's decision depends on an exact limit, v | Max indexes per table | 24 | `aurora dsql index limits` | | Max columns per index | 8 | `aurora dsql index limits` | | IDENTITY/SEQUENCE CACHE values | 1 or >= 65536 | `aurora dsql sequence cache` | -| Supported column data types | See docs | `aurora dsql supported data types` | -**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs where hitting a limit would cause failures; any time the exact number can affect user decision. +**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs +where hitting a limit would cause failures. No need to verify for general guidance or when +the exact number doesn't affect the user's decision. -**Fallback:** If `awsknowledge` is unavailable, use the defaults above and flag that limits should be verified against [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/). +**Fallback:** If `awsknowledge` is unavailable, MUST tell the user the lookup failed, MUST name the limit and its default value from the table above, and MUST link to [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/) for verification. When the recommendation depends on the exact value (e.g., batch size at the 3,000 row boundary), MUST refuse the fallback and require the user to verify the limit manually. ## CLI Scripts Available -Bash scripts in [scripts/](../../scripts/) for cluster management (create, delete, list, cluster info), psql connection, and bulk data loading from local/s3 csv/tsv/parquet files. -See [scripts/README.md](../../scripts/README.md) for usage and hook configuration. +Bash scripts in [scripts/](scripts/) for cluster management (create, delete, list, cluster info), psql connection, and bulk data loading from local/s3 csv/tsv/parquet files. +See [scripts/README.md](scripts/README.md) for usage. --- @@ -197,7 +200,7 @@ See [scripts/README.md](../../scripts/README.md) for usage and hook configuratio - MUST include tenant_id in all tables - MUST use `CREATE INDEX ASYNC` exclusively - MUST issue each DDL in its own transact call: `transact(["CREATE TABLE ..."])` -- MUST serialize arrays as TEXT or JSON; cast back at query time (`string_to_array(text, ',')` or `jsonb_array_elements_text(json::jsonb)`) +- MUST store arrays/JSON as TEXT ### Workflow 2: Safe Data Migration @@ -215,7 +218,10 @@ Every DDL statement generated in this workflow MUST be validated with `dsql_lint - MUST batch updates under 3,000 rows in separate transact calls - MUST issue each ALTER TABLE in its own transaction -**Recovery — batch fails midway:** Rows already updated keep their new value (each batch committed independently). Resume by filtering on the unset state (`WHERE new_column IS NULL`) and continue. Re-running is safe because the filter naturally excludes completed rows. +**Recovery — batch fails midway:** Rows already updated keep their new value (each batch committed +in its own transaction). Resume by filtering on the unset state — e.g. add +`WHERE new_column IS NULL` (or the sentinel value) to the next UPDATE — and continue from there. +Re-running the entire migration is safe because the filter naturally excludes completed rows. ### Workflow 3: Application-Layer Referential Integrity @@ -252,31 +258,7 @@ Run `dsql_lint(sql=source_sql, fix=true)` to validate and auto-convert PostgreSQ ### Workflow 8: Query Plan Explainability -Explains why the DSQL optimizer chose a particular plan. Triggered by slow queries, high DPU, unexpected Full Scans, or plans the user doesn't understand. **REQUIRES a structured Markdown diagnostic report is the deliverable** beyond conversation — run the workflow end-to-end before answering. Use the `aurora-dsql` MCP when connected; fall back to raw `psql` with a generated IAM token (see the fallback block below) otherwise. - -**Phase 0 — Load reference material.** Read all four before starting — each has content later phases need verbatim (node-type math, exact catalog SQL, the `>30s` skip protocol, required report elements): - -1. [query-plan/plan-interpretation.md](references/query-plan/plan-interpretation.md) — node types, duration math, anomalous values -2. [query-plan/catalog-queries.md](references/query-plan/catalog-queries.md) — pg_class / pg_stats / pg_indexes SQL -3. [query-plan/guc-experiments.md](references/query-plan/guc-experiments.md) — GUC procedures and `>30s` skip protocol -4. [query-plan/report-format.md](references/query-plan/report-format.md) — required report structure - -**Phase 1 — Capture the plan.** **ALWAYS** run `readonly_query("EXPLAIN ANALYZE VERBOSE …")` on the user's query verbatim (SELECT form) — **ALWAYS** capture a fresh plan from the cluster, even when the user describes the plan or reports an anomaly. **MAY** leverage `get_schema` or `information_schema` for schema sanity checks. When EXPLAIN errors (`relation does not exist`, `column does not exist`), **MUST** report the error verbatim — **MUST NOT** invent DSQL-specific semantics (e.g., case sensitivity, identifier quoting) as the root cause. Extract Query ID, Planning Time, Execution Time, DPU Estimate. **SELECT** runs as-is. **UPDATE/DELETE** rewrite to the equivalent SELECT (same join chain + WHERE) — the optimizer picks the same plan shape. **INSERT**, pl/pgsql, DO blocks, and functions **MUST** be rejected. **MUST NOT** use `transact --allow-writes` for plan capture; it bypasses MCP safety. - -**Phase 2 — Gather evidence.** Using SQL from `catalog-queries.md`, query `pg_class`, `pg_stats`, `pg_indexes`, `COUNT(*)`, `COUNT(DISTINCT)`. Classify estimation errors per `plan-interpretation.md` (2x–5x minor, 5x–50x significant, 50x+ severe). Detect correlated predicates and data skew. - -**Phase 3 — Experiment (conditional).** ≤30s: run GUC experiments per `guc-experiments.md` (default + merge-join-only) plus optional redundant-predicate test. >30s: skip experiments, include the manual GUC testing SQL verbatim in the report, and do not re-run for redundant-predicate testing. Anomalous values (impossible row counts): confirm query results are correct despite the anomalous EXPLAIN, flag as a potential DSQL bug, and produce the Support Request Template from `report-format.md`. - -**Phase 4 — Produce the report, invite reassessment.** Produce the full diagnostic report per the "Required Elements Checklist" in [query-plan/report-format.md](references/query-plan/report-format.md) — structure is non-negotiable. End with the "Next Steps" block from that reference so the user can ask for a reassessment after applying a recommendation. When the user says "reassess" (or equivalent), re-run Phase 1–2 and **append an "Addendum: After-Change Performance"** to the original report (before/after table, match against expected impact) rather than producing a new report. - -**psql fallback (MCP unavailable).** Pipe statements into `psql` via heredoc and check `$?`; report failures without proceeding on partial evidence: - -```bash -TOKEN=$(aws dsql generate-db-connect-admin-auth-token --hostname "$HOST" --region "$REGION") -PGPASSWORD="$TOKEN" psql "host=$HOST port=5432 user=admin dbname=postgres sslmode=require" <<<"EXPLAIN ANALYZE VERBOSE ;" -``` - -**Safety.** Plan capture uses `readonly_query` exclusively — it rejects INSERT/UPDATE/DELETE/DDL at the MCP layer. Rewrite DML to SELECT (Phase 1) rather than asking `transact --allow-writes` to run it; write-mode `transact` bypasses all MCP safety checks. **MUST NOT** run arbitrary DDL/DML or pl/pgsql. +Explains why the DSQL optimizer chose a particular plan. **REQUIRES a structured Markdown diagnostic report as the deliverable.** MUST load [query-plan/workflow.md](references/query-plan/workflow.md) for trigger criteria, context disambiguation, routing, and the full phased workflow (Phase 0–4). --- diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/catalog-queries.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/catalog-queries.md index 9b067cc8..2cd527b0 100644 --- a/plugins/databases-on-aws/skills/dsql/references/query-plan/catalog-queries.md +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/catalog-queries.md @@ -9,7 +9,10 @@ Exact SQL for interrogating optimizer statistics and actual cardinalities agains 3. [Index Definitions](#index-definitions) 4. [Actual Row Counts](#actual-row-counts) 5. [Actual Distinct Counts](#actual-distinct-counts) -6. [Value Distribution Analysis](#value-distribution-analysis) +6. [Column Types for Predicate Columns](#column-types-for-predicate-columns) +7. [B-Tree Cross-Type Operator Support](#b-tree-cross-type-operator-support) +8. [Indexed Column Types](#indexed-column-types) +9. [Value Distribution Analysis](#value-distribution-analysis) --- @@ -103,6 +106,87 @@ Compare against `pg_stats.n_distinct`: - If `n_distinct` is positive: compare directly - If `n_distinct` is negative: multiply absolute value by actual row count to get estimated distinct count +## Column Types for Predicate Columns + +MUST substitute `'{schema}'`, `'{table}'`, and `'{col}'` placeholders in the queries below via `safe_query.build()` with `ident()` — see input-validation.md. + +Retrieve the declared types for columns used in WHERE predicates and JOIN conditions, to detect type coercion index bypass (see plan-interpretation.md): + +```sql +SELECT + c.table_name, + c.column_name, + c.data_type, + c.udt_name, + c.is_nullable +FROM information_schema.columns c +WHERE c.table_schema = '{schema}' + AND c.table_name IN ('{table1}', '{table2}') + AND c.column_name IN ('{col1}', '{col2}'); +``` + +Cross-reference the column type against predicate literals visible in the EXPLAIN output. When the types differ, use the B-Tree Cross-Type Operator Support query below to determine whether the mismatch prevents index usage. + +## B-Tree Cross-Type Operator Support + +Determine which type pairs the DSQL B-Tree access method supports for index scans. If a (predicate-type, column-type) pair has no registered operator, the index cannot be used for that comparison: + +```sql +SELECT DISTINCT + lt.typname AS left_type, + rt.typname AS right_type +FROM pg_amop ao +JOIN pg_type lt ON lt.oid = ao.amoplefttype +JOIN pg_type rt ON rt.oid = ao.amoprighttype +-- 10003 is DSQL's B-Tree OID (PG mainline is 403). +-- Verify with: SELECT oid FROM pg_am WHERE amname = 'btree' +WHERE ao.amopmethod = 10003 + AND ao.amoplefttype != ao.amoprighttype +ORDER BY lt.typname, rt.typname; +``` + +This returns only the cross-type pairs (where left and right types differ). Same-type pairs are always supported. Use this to confirm whether a suspected type mismatch actually prevents index usage — if the pair appears in the result, the index CAN be used and the issue lies elsewhere. + +To check a specific pair: + +```sql +SELECT EXISTS ( + SELECT 1 + FROM pg_amop ao + JOIN pg_type lt ON lt.oid = ao.amoplefttype + JOIN pg_type rt ON rt.oid = ao.amoprighttype + -- 10003 = DSQL B-Tree OID; verify with: SELECT oid FROM pg_am WHERE amname = 'btree' + WHERE ao.amopmethod = 10003 + AND lt.typname = '{predicate_type}' + AND rt.typname = '{column_type}' +) AS index_usable; +``` + +## Indexed Column Types + +Retrieve index definitions together with their column types to identify type coercion bypass candidates: + +```sql +SELECT + i.indexname, + i.tablename, + a.attname AS column_name, + t.typname AS column_type, + i.indexdef +FROM pg_indexes i +JOIN pg_class ic ON ic.relname = i.indexname +JOIN pg_index ix ON ix.indexrelid = ic.oid +JOIN pg_attribute a ON a.attrelid = ix.indrelid + AND a.attnum = ANY(ix.indkey) +JOIN pg_type t ON t.oid = a.atttypid +JOIN pg_namespace n ON n.oid = ic.relnamespace +WHERE n.nspname = '{schema}' + AND i.tablename IN ('{table1}', '{table2}') +ORDER BY i.tablename, i.indexname, a.attnum; +``` + +Use this when a Full Scan appears despite an apparently usable index — compare the index column's `column_type` against the predicate literal's inferred type. + ## Value Distribution Analysis For columns with suspected data skew, retrieve the actual top-N value frequencies: diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/plan-interpretation.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/plan-interpretation.md index da4fefa5..8917896d 100644 --- a/plugins/databases-on-aws/skills/dsql/references/query-plan/plan-interpretation.md +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/plan-interpretation.md @@ -11,7 +11,8 @@ 7. [Hash Table Resizing](#hash-table-resizing) 8. [High-Loop Storage Lookups](#high-loop-storage-lookups) 9. [Anomalous Values](#anomalous-values) -10. [Projections and Row Width](#projections-and-row-width) +10. [Type Coercion and Index Bypass](#type-coercion-and-index-bypass) +11. [Projections and Row Width](#projections-and-row-width) --- @@ -183,6 +184,63 @@ Detect physically impossible row counts in DSQL plan nodes: These anomalous values do not affect query correctness — only diagnostic output accuracy. +## Type Coercion and Index Bypass + +An index may exist on a column yet not be used when the predicate value's type does not match the column's declared type and no implicit cast exists between the two types. + +### Detection Pattern + +Flag this condition when **all** of the following are true: + +1. An index exists whose leading column matches a WHERE predicate column +2. The plan uses a Full Scan or Seq Scan on that table instead of an Index Scan +3. The predicate literal's type differs from the indexed column's declared type +4. The `pg_amop` query in catalog-queries.md (B-Tree Cross-Type Operator Support) returns no row for the type pair + +### Why It Happens + +DSQL (like PostgreSQL) can only use a B-Tree index when a cross-type B-Tree operator is registered in `pg_amop` for the (predicate-type, column-type) pair. When a predicate supplies a value of a different type: + +- If a cross-type B-Tree operator is registered (verify via the `pg_amop` query in catalog-queries.md), the index can be used +- If no cross-type operator is registered, the planner MUST apply a per-row cast or comparison function that cannot use the index's ordering — resulting in a full scan + +This is particularly surprising to users because the query returns correct results (the cast happens at execution time, row by row) but performance degrades dramatically on large tables. + +### Determining Index-Compatible Type Pairs + +Rather than relying on a static matrix, query `pg_amop` directly on the cluster to determine which cross-type comparisons the DSQL B-Tree index access method supports. See catalog-queries.md for the exact SQL. + +The key insight: DSQL's B-Tree access method (amopmethod `10003`) only supports index scans when a registered operator exists for the specific (left-type, right-type) pair. If no operator is registered for the pair, the index cannot be used — regardless of whether a general-purpose implicit cast exists in `pg_cast`. + +At time of writing, cross-type index support is limited to the integer family (smallint, integer, bigint — all combinations). All other indexed types (text, numeric, uuid, timestamp, date, boolean, etc.) require an exact type match. MUST verify via the `pg_amop` query in catalog-queries.md before asserting this to a user, as DSQL MAY add cross-type operator families in future releases. + +### Quantifying Impact + +When this pattern is detected: + +``` +Full Scan rows processed = actual_rows from Full Scan node +Index Scan rows (expected) = estimated rows matching the predicate (from pg_stats selectivity) +Scan amplification = Full Scan rows / Index Scan rows (expected) +``` + +### Recommendation Template + +When a type coercion bypass is confirmed: + +- **Explicit cast in the predicate:** Rewrite `WHERE col = '42'` as `WHERE col = 42::float` (cast the literal to the column type) +- **Application-layer fix:** Ensure the application passes parameters with the correct type rather than relying on implicit conversion +- **MUST keep the column type unchanged** — changing it to accommodate mismatched predicates masks the real issue and MAY break other queries + +### Evidence Gathering + +To confirm this pattern, cross-reference: + +1. The column type from `pg_attribute` or `information_schema.columns` (see catalog-queries.md) +2. The index definition from `pg_indexes` +3. The predicate literal in the EXPLAIN output (visible in `Filter:` or `Index Cond:` lines) +4. The `pg_amop` query in catalog-queries.md (B-Tree Cross-Type Operator Support) + ## Projections and Row Width Capture Projections lists from Storage Scan and Storage Lookup nodes: diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites-dsql-specific.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites-dsql-specific.md new file mode 100644 index 00000000..9e0db458 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites-dsql-specific.md @@ -0,0 +1,10 @@ +# Query Rewrites — DSQL-Specific + +SQL rewrites that address Aurora DSQL-specific behaviors and optimizer constraints. These SHOULD be recommended when the plan reveals inefficiency unique to DSQL's distributed architecture. + +## Available Rewrites + +| Pattern Detected | Reference File | +| ------------------------------- | ------------------------------------------------------------- | +| COUNT(*) timeout on large table | [reltuples-estimate.md](query-rewrites/reltuples-estimate.md) | +| Join count exceeds DP threshold | [split-large-joins.md](query-rewrites/split-large-joins.md) | diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites-generic.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites-generic.md new file mode 100644 index 00000000..9aad7cbf --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites-generic.md @@ -0,0 +1,19 @@ +# Query Rewrites — Index + +Generic SQL rewrites that SHOULD be recommended when a plan reveals inefficiency traceable to query structure (rather than missing indexes or stale statistics). Load the specific rewrite file that matches the observed pattern. + +## Available Rewrites + +| Pattern Detected | Reference File | +| ------------------------------------------ | --------------------------------------------------------------------------------------- | +| Multiple OR on same column | [or-to-in.md](query-rewrites/or-to-in.md) | +| LEFT JOIN with null-rejecting WHERE | [left-join-to-inner.md](query-rewrites/left-join-to-inner.md) | +| Filter on join column not propagated | [propagate-filter.md](query-rewrites/propagate-filter.md) | +| Uncorrelated IN-subquery | [subquery-unnesting-uncorrelated.md](query-rewrites/subquery-unnesting-uncorrelated.md) | +| Correlated EXISTS subquery | [subquery-unnesting-correlated.md](query-rewrites/subquery-unnesting-correlated.md) | +| Scalar correlated subquery in SELECT | [subquery-unnesting-scalar.md](query-rewrites/subquery-unnesting-scalar.md) | +| Computation on indexed column in predicate | [push-computation-to-constant.md](query-rewrites/push-computation-to-constant.md) | +| Large IN-subquery result set | [in-subquery-to-exists.md](query-rewrites/in-subquery-to-exists.md) | +| GROUP BY after JOIN with dimension columns | [push-group-by-into-subquery.md](query-rewrites/push-group-by-into-subquery.md) | +| NOT IN with large or nullable subquery | [not-in-to-not-exists.md](query-rewrites/not-in-to-not-exists.md) | +| Nested UNION ALL | [flatten-union-all.md](query-rewrites/flatten-union-all.md) | diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/flatten-union-all.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/flatten-union-all.md new file mode 100644 index 00000000..3a483e57 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/flatten-union-all.md @@ -0,0 +1,51 @@ +# Rewrite: Flatten Nested UNION ALL + +When a query contains UNION ALL nested inside another UNION ALL, flatten all branches into a single UNION ALL to simplify the plan and reduce intermediate merge steps. + +**SHOULD apply when:** All set operations are UNION ALL (no deduplication). + +**SHOULD skip when:** Any branch uses UNION (deduplicating), which MUST remain distinct. + +```sql +-- Original +SELECT * FROM sales_q1 +UNION ALL ( + SELECT * FROM sales_q2 + UNION ALL + SELECT * FROM sales_q3 +); + +-- Rewritten +SELECT * FROM sales_q1 +UNION ALL +SELECT * FROM sales_q2 +UNION ALL +SELECT * FROM sales_q3; +``` + +```sql +-- CTE example +-- Original +WITH a AS ( + SELECT * FROM t1 + UNION ALL + SELECT * FROM t2 +) +SELECT * FROM a +UNION ALL +SELECT * FROM t3; + +-- Rewritten +SELECT * FROM t1 +UNION ALL +SELECT * FROM t2 +UNION ALL +SELECT * FROM t3; +``` + +```sql +-- Not applicable: UNION (deduplicating) must stay distinct +SELECT * FROM t1 +UNION +SELECT * FROM t2; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/in-subquery-to-exists.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/in-subquery-to-exists.md new file mode 100644 index 00000000..fd378060 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/in-subquery-to-exists.md @@ -0,0 +1,56 @@ +# Rewrite: Replace IN-Subquery with EXISTS + +When a column is compared to a subquery using IN and the subquery may return many rows, rewrite as a correlated EXISTS to leverage short-circuit evaluation. + +**SHOULD apply when:** The IN subquery returns a large or variable number of rows. + +**SHOULD skip when:** The IN list is a small static set of constants. + +```sql +-- Original +SELECT * +FROM customers +WHERE customer_id IN ( + SELECT customer_id + FROM orders + WHERE order_date >= NOW() - INTERVAL '30 days' +); + +-- Rewritten +SELECT * +FROM customers c +WHERE EXISTS ( + SELECT 1 + FROM orders o + WHERE o.customer_id = c.customer_id + AND o.order_date >= NOW() - INTERVAL '30 days' +); +``` + +```sql +-- Additional example +SELECT product_id +FROM products +WHERE product_id IN ( + SELECT product_id + FROM inventory + WHERE quantity > 0 +); + +-- Rewritten +SELECT product_id +FROM products p +WHERE EXISTS ( + SELECT 1 + FROM inventory i + WHERE i.product_id = p.product_id + AND i.quantity > 0 +); +``` + +```sql +-- Not applicable: small static set of constants +SELECT * +FROM users +WHERE user_type IN ('admin', 'editor', 'viewer'); +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/left-join-to-inner.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/left-join-to-inner.md new file mode 100644 index 00000000..506ac49f --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/left-join-to-inner.md @@ -0,0 +1,30 @@ +# Rewrite: LEFT JOIN with Null-Rejecting Predicate to INNER JOIN + +When a query uses LEFT JOIN but the WHERE clause rejects NULLs on the joined table, rewrite as INNER JOIN. This enables a simpler, more efficient join plan. + +**SHOULD apply when:** The WHERE clause rejects NULLs from the right-hand side of a LEFT JOIN (e.g., `IS NOT NULL`, equality comparisons, or any predicate that cannot be true for NULL). + +**SHOULD skip when:** NULLs from the right-hand side are intentionally preserved in the result. + +```sql +-- Original +SELECT * +FROM R1 +LEFT JOIN R2 + ON R1.key = R2.key +WHERE R2.key IS NOT NULL; + +-- Rewritten +SELECT * +FROM R1 +JOIN R2 + ON R1.key = R2.key; +``` + +```sql +-- Not applicable: NULLs from R2 are intentionally preserved +SELECT * +FROM R1 +LEFT JOIN R2 + ON R1.key = R2.key; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/not-in-to-not-exists.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/not-in-to-not-exists.md new file mode 100644 index 00000000..b209bb73 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/not-in-to-not-exists.md @@ -0,0 +1,56 @@ +# Rewrite: Replace NOT IN with NOT EXISTS + +When a column is filtered with `NOT IN (subquery)`, rewrite as a correlated NOT EXISTS. This avoids building a large intermediate set. + +**Semantics warning:** NOT EXISTS does not preserve NOT IN's NULL-propagation behaviour. When the subquery MAY contain NULLs, `NOT IN` returns no rows while `NOT EXISTS` returns the non-matching rows — the rewrite changes results. MUST confirm intent with the user before applying when NULLs are possible. + +**SHOULD apply when:** The NOT IN subquery returns many rows and the subquery column is guaranteed NOT NULL (or the user confirms the changed NULL behaviour is acceptable). + +**SHOULD skip when:** The exclusion list is a small static set of constants. + +```sql +-- Original +SELECT * +FROM customers +WHERE customer_id NOT IN ( + SELECT customer_id + FROM blacklisted_customers +); + +-- Rewritten +SELECT * +FROM customers c +WHERE NOT EXISTS ( + SELECT 1 + FROM blacklisted_customers b + WHERE b.customer_id = c.customer_id +); +``` + +```sql +-- Additional example +SELECT product_id +FROM products +WHERE product_id NOT IN ( + SELECT product_id + FROM discontinued_products + WHERE discontinued = true +); + +-- Rewritten +SELECT p.product_id +FROM products p +WHERE NOT EXISTS ( + SELECT 1 + FROM discontinued_products d + WHERE d.product_id = p.product_id + AND d.discontinued = true +); +``` + +```sql +-- Not applicable: small static exclusion set +SELECT * +FROM items +WHERE item_type NOT IN ('typeA', 'typeB'); +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/or-to-in.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/or-to-in.md new file mode 100644 index 00000000..d76718ec --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/or-to-in.md @@ -0,0 +1,38 @@ +# Rewrite: OR to IN + +Rewrite multiple OR clauses comparing the same column to different constant values into a single IN clause. This enables more efficient index lookups and reduces redundant OR evaluations. + +**SHOULD apply when:** All OR comparisons target the same column using equality (`=`) with constant values. + +**SHOULD skip when:** OR clauses compare different columns or involve non-constant expressions. + +```sql +-- Original +SELECT * +FROM R +WHERE R.key = c1 OR R.key = c2; + +-- Rewritten +SELECT * +FROM R +WHERE R.key IN (c1, c2); +``` + +```sql +-- Additional example +SELECT name, age +FROM employees +WHERE department_id = 1 OR department_id = 2 OR department_id = 3; + +-- Rewritten +SELECT name, age +FROM employees +WHERE department_id IN (1, 2, 3); +``` + +```sql +-- Not applicable: different columns involved +SELECT name, age +FROM employees +WHERE department_id = 1 OR location_id = 2; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/propagate-filter.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/propagate-filter.md new file mode 100644 index 00000000..0d7fdb0c --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/propagate-filter.md @@ -0,0 +1,48 @@ +# Rewrite: Propagate Filter to JOIN Columns + +When a query has an equality join condition and a filter predicate on one join attribute, propagate the filter to the corresponding attribute on the other table(s). This enables earlier filtering and reduces intermediate result sizes. + +**SHOULD apply when:** The filter predicate is on a column involved in an equality join condition. + +**SHOULD skip when:** The predicate is on a non-join column. + +```sql +-- Original +SELECT * +FROM R1, R2 +WHERE R1.id = R2.id + AND R1.id > 10; + +-- Rewritten +SELECT * +FROM R1, R2 +WHERE R1.id = R2.id + AND R1.id > 10 + AND R2.id > 10; +``` + +```sql +-- Transitive propagation across multiple tables +SELECT * +FROM R1, R2, R3 +WHERE R1.id = R2.id + AND R2.id = R3.id + AND R1.id > 10; + +-- Rewritten +SELECT * +FROM R1, R2, R3 +WHERE R1.id = R2.id + AND R2.id = R3.id + AND R1.id > 10 + AND R2.id > 10 + AND R3.id > 10; +``` + +```sql +-- Not applicable: predicate is on a non-join column +SELECT * +FROM R1, R2 +WHERE R1.id = R2.id + AND R1.other_column > 10; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/push-computation-to-constant.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/push-computation-to-constant.md new file mode 100644 index 00000000..1af84cd4 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/push-computation-to-constant.md @@ -0,0 +1,33 @@ +# Rewrite: Push Computation to Constant Side + +When a filter predicate applies invertible arithmetic to an indexed column, move the computation to the constant side so the column appears alone and indexes can be used. + +**SHOULD apply when:** All operations on the column are mathematically invertible (addition, subtraction, multiplication/division by non-zero constant). + +**SHOULD skip when:** The computation involves non-invertible functions (substring, lower/upper, trigonometric functions) or moving the computation changes query semantics (precision loss, integer-division rounding). + +```sql +-- Original (amount is NUMERIC) +SELECT * FROM transactions +WHERE amount * 100 / 5 = 2000.00; + +-- Rewritten +SELECT * FROM transactions +WHERE amount = 2000.00 * 5 / 100; +``` + +```sql +-- Additional example +SELECT * FROM orders +WHERE order_id + 5 > 100; + +-- Rewritten +SELECT * FROM orders +WHERE order_id > 100 - 5; +``` + +```sql +-- Not applicable: non-invertible function +SELECT * FROM users +WHERE substring(username, 1, 3) = 'abc'; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/push-group-by-into-subquery.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/push-group-by-into-subquery.md new file mode 100644 index 00000000..16d0ae7b --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/push-group-by-into-subquery.md @@ -0,0 +1,65 @@ +# Rewrite: Push GROUP BY into Subquery + +When a query aggregates after joining a fact table to a dimension table, push the GROUP BY into a subquery on the fact table alone. This aggregates fewer rows and joins the smaller result to retrieve dimension columns. + +**SHOULD apply when:** The aggregation is on the fact table and additional columns come from a dimension table joined on the grouping key. + +**SHOULD skip when:** No additional columns are needed beyond the grouping key. + +```sql +-- Original +SELECT c.customer_id, + c.first_name, + c.last_name, + COUNT(*) AS order_count +FROM customers c +JOIN orders o + ON c.customer_id = o.customer_id +GROUP BY c.customer_id, c.first_name, c.last_name; + +-- Rewritten +SELECT c.customer_id, + c.first_name, + c.last_name, + agg.order_count +FROM customers c +JOIN ( + SELECT customer_id, + COUNT(*) AS order_count + FROM orders + GROUP BY customer_id +) AS agg + ON c.customer_id = agg.customer_id; +``` + +```sql +-- Additional example +SELECT cat.category_name, + cat.description, + SUM(t.amount) AS total_amount +FROM categories cat +JOIN transactions t + ON cat.id = t.category_id +GROUP BY cat.category_name, cat.description; + +-- Rewritten +SELECT cat.category_name, + cat.description, + agg.total_amount +FROM categories cat +JOIN ( + SELECT category_id, + SUM(amount) AS total_amount + FROM transactions + GROUP BY category_id +) AS agg + ON cat.id = agg.category_id; +``` + +```sql +-- Not applicable: no additional columns needed +SELECT department_id, + SUM(salary) AS total_salary +FROM employees +GROUP BY department_id; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/reltuples-estimate.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/reltuples-estimate.md new file mode 100644 index 00000000..6008b23e --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/reltuples-estimate.md @@ -0,0 +1,26 @@ +# Rewrite: Replace COUNT(*) with reltuples Estimate (DSQL-Specific) + +When a query performs `COUNT(*)` on a large table, rewrite to use the `reltuples` value from `pg_class` for an approximate row count. This is a common workaround for cases where `COUNT(*)` is too slow or times out on large tables. + +**SHOULD apply when:** An approximate count is acceptable and the table is large enough that `COUNT(*)` is prohibitively expensive. + +**Staleness warning:** `reltuples` reflects the last `ANALYZE` or autovacuum run. MUST warn the user that the value MAY be stale on write-heavy or recently created tables. SHOULD recommend cross-checking `pg_stat_user_tables.last_analyze` when the count drives a decision. + +**SHOULD skip when:** The application requires an exact count. + +```sql +-- Original +SELECT COUNT(*) AS exact_count +FROM big_table; + +-- Rewritten (DSQL) +SELECT reltuples::bigint AS estimated_count +FROM pg_class +WHERE oid = 'public.big_table'::regclass; +``` + +```sql +-- Not applicable: exact count required +SELECT COUNT(*) AS exact_count +FROM big_table; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/split-large-joins.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/split-large-joins.md new file mode 100644 index 00000000..aa0bffdb --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/split-large-joins.md @@ -0,0 +1,53 @@ +# Rewrite: Split Large Joins for DP Join Ordering (DSQL-Specific) + +When a query joins more tables than the optimizer's DP threshold (e.g., 10 joins for Aurora DSQL), rewrite it into multiple subqueries each joining no more tables than the threshold, then join the subquery results. + +This allows the PostgreSQL-based DSQL engine to apply dynamic-programming (DP) join ordering within each smaller block, producing a better overall join plan than a greedy algorithm on many tables. + +**SHOULD apply when:** The total number of joined tables exceeds the DP threshold (`join_collapse_limit` or `from_collapse_limit`). Partition the join into CTEs each with table count at or below the threshold, push down relevant filters, and join the CTE results. + +**SHOULD skip when:** The total table count is at or below the threshold, or splitting would prevent necessary cross-block optimizations. + +```sql +-- Original +SELECT * +FROM R1 + JOIN R2 ON R1.id = R2.id + JOIN R3 ON R2.id = R3.id + JOIN R4 ON R3.id = R4.id + JOIN R5 ON R4.id = R5.id + JOIN R6 ON R5.id = R6.id + JOIN R7 ON R6.id = R7.id +WHERE Filters; + +-- Rewritten (DSQL) +WITH + sub1 AS ( + SELECT * + FROM R1 + JOIN R2 ON R1.id = R2.id + JOIN R3 ON R2.id = R3.id + JOIN R4 ON R3.id = R4.id + WHERE + ), + sub2 AS ( + SELECT * + FROM R5 + JOIN R6 ON R5.id = R6.id + JOIN R7 ON R6.id = R7.id + WHERE + ) +SELECT * +FROM sub1 +JOIN sub2 ON sub1.id = sub2.id; +``` + +```sql +-- Not applicable: total tables ≤ DP threshold +SELECT * +FROM R1 + JOIN R2 ON R1.id = R2.id + JOIN R3 ON R2.id = R3.id + JOIN R4 ON R3.id = R4.id +WHERE Filters; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-correlated.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-correlated.md new file mode 100644 index 00000000..1e28ca1a --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-correlated.md @@ -0,0 +1,58 @@ +# Rewrite: Subquery Unnesting — Correlated + +When a query contains a correlated EXISTS subquery that the optimizer handles poorly, rewrite it as an explicit JOIN. This MAY expose the subquery to better join optimizations, especially when indexes exist on the join columns. + +**SHOULD apply when:** The correlated subquery is inside an EXISTS clause, the correlation is expressible as a JOIN condition (typically equality), and the inner side is unique on the join key (otherwise DISTINCT changes results by collapsing pre-existing duplicates in the outer table). + +**SHOULD skip when:** The correlation cannot be expressed as a simple JOIN condition, or the inner side is not unique on the join key and duplicate preservation matters. + +```sql +-- Original +SELECT * +FROM R +WHERE EXISTS ( + SELECT 1 + FROM S + WHERE S.x = R.x + AND S.y > 0 +); + +-- Rewritten (apply only when S.x is unique; otherwise DISTINCT +-- collapses pre-existing duplicates in R) +SELECT DISTINCT R.* +FROM R +JOIN S + ON S.x = R.x + AND S.y > 0; +``` + +```sql +-- Additional example +SELECT product_id +FROM products +WHERE EXISTS ( + SELECT 1 + FROM product_reviews + WHERE product_reviews.product_id = products.product_id + AND product_reviews.rating >= 4 +); + +-- Rewritten (product_reviews.product_id is not unique, so +-- DISTINCT is required — verify this is acceptable) +SELECT DISTINCT products.product_id +FROM products +JOIN product_reviews + ON product_reviews.product_id = products.product_id + AND product_reviews.rating >= 4; +``` + +```sql +-- Not applicable: correlation cannot be expressed as a JOIN condition +SELECT * +FROM R +WHERE EXISTS ( + SELECT 1 + FROM S + WHERE S.x + S.y = R.z +); +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-scalar.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-scalar.md new file mode 100644 index 00000000..fd2872b0 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-scalar.md @@ -0,0 +1,61 @@ +# Rewrite: Subquery Unnesting — Scalar + +When a query contains a scalar subquery in the SELECT clause computing an aggregate correlated by equality, rewrite it as a LEFT JOIN with GROUP BY. This reduces repeated subquery executions and enables better join planning. + +**SHOULD apply when:** The scalar subquery is correlated via equality and contains an aggregate function (MAX, MIN, COUNT, SUM). For COUNT and SUM, MUST wrap with `COALESCE(..., 0)` because the LEFT JOIN returns NULL (not 0) for unmatched rows — the scalar subquery returns 0. + +**SHOULD skip when:** The scalar subquery is uncorrelated. + +```sql +-- Original +SELECT + R.*, + (SELECT MAX(S.y) + FROM S + WHERE S.x = R.x) AS max_y +FROM R; + +-- Rewritten +SELECT + R.*, + Agg.max_y +FROM R +LEFT JOIN ( + SELECT x, MAX(y) AS max_y + FROM S + GROUP BY x +) AS Agg + ON Agg.x = R.x; +``` + +```sql +-- Additional example +SELECT + R.id, + R.name, + (SELECT COUNT(*) + FROM S + WHERE S.owner_id = R.id) AS s_count +FROM R; + +-- Rewritten (COALESCE required — COUNT returns 0, LEFT JOIN returns NULL) +SELECT + R.id, + R.name, + COALESCE(Agg.s_count, 0) AS s_count +FROM R +LEFT JOIN ( + SELECT owner_id, COUNT(*) AS s_count + FROM S + GROUP BY owner_id +) AS Agg + ON Agg.owner_id = R.id; +``` + +```sql +-- Not applicable: scalar subquery is uncorrelated +SELECT + R.*, + (SELECT MAX(S.y) FROM S) AS global_max_y +FROM R; +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-uncorrelated.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-uncorrelated.md new file mode 100644 index 00000000..010c5cd4 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/subquery-unnesting-uncorrelated.md @@ -0,0 +1,65 @@ +# Rewrite: Subquery Unnesting — Uncorrelated + +When a query contains an uncorrelated `IN (SELECT ...)` subquery, rewrite it as an EXISTS (preferred, preserves semi-join semantics) or explicit JOIN. This enables better join order optimizations and index usage. + +**SHOULD apply when:** The subquery does not reference columns from the outer query. + +**SHOULD skip when:** The subquery is correlated (references outer query columns). + +```sql +-- Original +SELECT * +FROM R +WHERE R.a IN ( + SELECT S.b + FROM S +); + +-- Rewritten (preferred — EXISTS preserves semi-join semantics) +SELECT * +FROM R +WHERE EXISTS ( + SELECT 1 + FROM S + WHERE S.b = R.a +); + +-- Alternative (JOIN form — apply only when S.b is unique, +-- otherwise DISTINCT collapses pre-existing duplicates in R) +SELECT DISTINCT R.* +FROM R +JOIN S + ON R.a = S.b; +``` + +```sql +-- Additional example +SELECT order_id +FROM orders +WHERE customer_id IN ( + SELECT customer_id + FROM customers + WHERE country = 'US' +); + +-- Rewritten +SELECT order_id +FROM orders +WHERE EXISTS ( + SELECT 1 + FROM customers + WHERE customers.customer_id = orders.customer_id + AND customers.country = 'US' +); +``` + +```sql +-- Not applicable: subquery is correlated +SELECT * +FROM R +WHERE R.a IN ( + SELECT S.b + FROM S + WHERE S.c = R.d +); +``` diff --git a/plugins/databases-on-aws/skills/dsql/references/query-plan/workflow.md b/plugins/databases-on-aws/skills/dsql/references/query-plan/workflow.md new file mode 100644 index 00000000..785e7cb3 --- /dev/null +++ b/plugins/databases-on-aws/skills/dsql/references/query-plan/workflow.md @@ -0,0 +1,127 @@ +# Query Plan Explainability — Workflow + +Complete workflow for diagnosing DSQL query plan performance issues. Produces a structured Markdown diagnostic report as the deliverable. + +## Table of Contents + +1. [Trigger Criteria](#trigger-criteria) +2. [Context Disambiguation](#context-disambiguation) +3. [Routing](#routing) +4. [Phase 0: Load Reference Material](#phase-0-load-reference-material) +5. [Phase 1: Capture the Plan](#phase-1-capture-the-plan) +6. [Phase 2: Gather Evidence](#phase-2-gather-evidence) +7. [Phase 3: Experiment (conditional)](#phase-3-experiment-conditional) +8. [Phase 4: Produce the Report, Invite Reassessment](#phase-4-produce-the-report-invite-reassessment) +9. [Safety](#safety) + +--- + +## Trigger Criteria + +Enter this workflow if **ANY** of these signals are present: + +| Signal | Examples | +| ----------------------------------------------------- | ----------------------------------------------------------------------------- | +| User provides SQL + mentions performance/speed/cost | "this query takes 8 seconds", "too slow", "optimize this", "make this faster" | +| User mentions DPU cost or resource consumption | "high DPU", "query cost is too high", "read DPU seems excessive" | +| User asks about a plan choice or scan type | "why is it doing a full scan?", "why not use the index?" | +| User pastes EXPLAIN / EXPLAIN ANALYZE output | Raw plan text in the message | +| User references a Query ID and asks about performance | "query abc-123 is slow" | +| User says "reassess" / "re-run" / "I added the index" | Reassessment re-entry — re-runs Phase 1–2 and appends an Addendum per Phase 4 | + +--- + +## Context Disambiguation + +Before entering the workflow, confirm the query targets DSQL: + +| Condition | Action | +| ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | +| Only `aurora-dsql` MCP is connected (no other database MCPs) | Proceed — DSQL is the only target | +| User explicitly mentions DSQL, Aurora DSQL, or a known DSQL cluster | Proceed | +| Conversation already has prior DSQL interaction (earlier queries, schema ops) | Proceed | +| Multiple database MCPs are connected and no DSQL signal in the message | Ask the user which database they mean before proceeding | +| No database MCP is connected | Inform the user that the `aurora-dsql` MCP is required and offer the psql fallback | + +--- + +## Routing + +| Condition | Path | +| ---------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| User provides SQL but no plan output | Full workflow: Phase 0 → 1 → 2 → 3 → 4 | +| User pastes plan output + asks to fix/optimize | Full workflow: Phase 0 → 1 (re-capture fresh plan) → 2 → 3 → 4 | +| User pastes plan output + asks what it means (educational) | Full workflow: Phase 0 → 1 (re-capture fresh plan) → 2 → 3 → 4. The report is the explanation — do not produce a shorter conversational answer instead | +| Execution time >30s detected at Phase 1 | Phase 3 skips experiments per guc-experiments.md | +| User says "reassess" or equivalent | Re-run Phase 1–2, append Addendum to existing report | + +--- + +## Phase 0: Load Reference Material + +MUST read these four files before starting — each has content later phases need verbatim (node-type math, exact catalog SQL, the `>30s` skip protocol, required report elements): + +1. [plan-interpretation.md](plan-interpretation.md) — node types, duration math, anomalous values +2. [catalog-queries.md](catalog-queries.md) — pg_class / pg_stats / pg_indexes SQL +3. [guc-experiments.md](guc-experiments.md) — GUC procedures and `>30s` skip protocol +4. [report-format.md](report-format.md) — required report structure + +SHOULD also load these index files to identify applicable rewrites at Phase 2: + +1. [query-rewrites-generic.md](query-rewrites-generic.md) — pattern index (load specific sub-file when a match is found) +2. [query-rewrites-dsql-specific.md](query-rewrites-dsql-specific.md) — DSQL-specific pattern index + +--- + +## Phase 1: Capture the Plan + +**ALWAYS** run `readonly_query("EXPLAIN ANALYZE VERBOSE …")` on the user's query verbatim (SELECT form) — **ALWAYS** capture a fresh plan from the cluster, even when the user describes the plan or reports an anomaly. **MAY** leverage `get_schema` or `information_schema` for schema sanity checks. + +When EXPLAIN errors (`relation does not exist`, `column does not exist`), **MUST** report the error verbatim — **MUST NOT** invent DSQL-specific semantics (e.g., case sensitivity, identifier quoting) as the root cause. + +Extract: Query ID, Planning Time, Execution Time, DPU Estimate. + +| Statement type | Action | +| -------------------------------------- | -------------------------------------------------------------------------------------------- | +| SELECT | Run as-is | +| UPDATE / DELETE | Rewrite to equivalent SELECT (same join chain + WHERE) — optimizer picks the same plan shape | +| INSERT, pl/pgsql, DO blocks, functions | **MUST** reject | + +**MUST NOT** use `transact --allow-writes` for plan capture; it bypasses MCP safety. + +--- + +## Phase 2: Gather Evidence + +Using SQL from `catalog-queries.md`, query `pg_class`, `pg_stats`, `pg_indexes`, `COUNT(*)`, `COUNT(DISTINCT)`. + +1. Classify estimation errors per `plan-interpretation.md` (2x–5x minor, 5x–50x significant, 50x+ severe). +2. Detect correlated predicates and data skew. +3. When a Full Scan appears despite an apparently usable index, check for **type coercion index bypass**: retrieve indexed column types and compare against predicate literal types using the `pg_amop` query in `catalog-queries.md` (B-Tree Cross-Type Operator Support). +4. Check whether any query rewrite from `query-rewrites-generic.md` or `query-rewrites-dsql-specific.md` applies to the query structure (e.g., OR-to-IN, subquery unnesting, NOT IN to NOT EXISTS, split large joins). + +--- + +## Phase 3: Experiment (conditional) + +- **≤30s:** Run GUC experiments per `guc-experiments.md` (default + merge-join-only) plus optional redundant-predicate test. +- **>30s:** Skip experiments, include the manual GUC testing SQL verbatim in the report, and do not re-run for redundant-predicate testing. +- **Anomalous values** (impossible row counts): confirm query results are correct despite the anomalous EXPLAIN, flag as a potential DSQL bug, and produce the Support Request Template from `report-format.md`. + +--- + +## Phase 4: Produce the Report, Invite Reassessment + +Produce the full diagnostic report per the "Required Elements Checklist" in [report-format.md](report-format.md) — structure is non-negotiable. + +End with the "Next Steps" block from that reference so the user can ask for a reassessment after applying a recommendation. + +When the user says "reassess" (or equivalent), re-run Phase 1–2 and **append an "Addendum: After-Change Performance"** to the original report (before/after table, match against expected impact) rather than producing a new report. + +If a query rewrite was identified in Phase 2, include it as a recommendation with the original and rewritten SQL side by side. + +--- + +## Safety + +Plan capture MUST use `readonly_query` exclusively — it rejects INSERT/UPDATE/DELETE/DDL at the MCP layer. Rewrite DML to SELECT (Phase 1) rather than asking `transact --allow-writes` to run it; write-mode `transact` bypasses all MCP safety checks. **MUST NOT** run arbitrary DDL/DML or pl/pgsql. diff --git a/tools/evals/databases-on-aws/README.md b/tools/evals/databases-on-aws/README.md index b520fc0b..bf5cf698 100644 --- a/tools/evals/databases-on-aws/README.md +++ b/tools/evals/databases-on-aws/README.md @@ -18,6 +18,8 @@ tools/evals/databases-on-aws/ ├── trigger_evals.json # Tier 1: triggering evals (26 test cases) ├── safe_query_evals.json # Tier 3: safe_query enforcement (6 prompts, ~30 expectations) ├── query_explainability_evals.json # Workflow 8: query plan diagnostics (9 prompts, 70 assertions) + ├── query_plan_rewrite_evals.json # Query rewrites: type coercion, subquery unnesting, etc. (11 prompts, manual) + ├── query_plan_rewrite_eval_results.md # Manual eval results — with-skill vs baseline comparison └── scripts/ ├── run_functional_evals.py # Runner/grader for Tier 2 ├── run_query_explainability_evals.py # Runner/grader for Workflow 8 @@ -152,6 +154,40 @@ PYTHONPATH=":$PYTHONPATH" python -m scripts.run_loop \ --- +### Query Plan Rewrite Evals (manual) + +Tests whether the agent recommends correct SQL rewrites for common performance anti-patterns, +including type coercion index bypass, subquery unnesting, OR-to-IN, GROUP BY pushdown, and +DSQL-specific patterns (reltuples estimate, join splitting). Includes one negative case +(OR across different columns — agent should decline). + +**Evaluation method:** Manual qualitative comparison (n=1). Run `claude -p` with skill loaded vs +`claude -p --bare` from a clean directory. Results in `query_plan_rewrite_eval_results.md`. +No automated runner script — this suite is manual-only. + +**Future direction:** Many of these rewrites are deterministic pattern transformations. A future +iteration SHOULD implement them as a Python SQL converter script that parses and rewrites SQL +directly, with the reference files serving as documentation for the converter's rules. This +would move correctness-critical rewrites out of the LLM and into deterministic code. + +**What it checks** (11 eval prompts): + +| Eval | Focus | Key assertions | +| -------------- | --------------------------- | ------------------------------------------------------------ | +| 200 | IN-subquery Full Scan | Recommends EXISTS rewrite, checks type coercion | +| 201 | Type coercion index bypass | Identifies string-vs-integer mismatch, references pg_amop | +| 202 | 12-table join ordering | Identifies DP threshold, recommends CTE splitting | +| 203 | COUNT(*) timeout | Recommends reltuples, warns about staleness | +| 204 | Multiple OR to IN | Recommends IN rewrite, checks type coercion | +| 205 | GROUP BY after JOIN | Recommends subquery aggregation | +| 206 | LEFT JOIN null rejection | Converts to INNER JOIN | +| 207 | Computation on indexed col | Pushes arithmetic to constant side | +| 208 | NOT IN with NULLs | Recommends NOT EXISTS, warns about NULL semantics difference | +| 209 | Nested UNION ALL | Flattens to single-level UNION ALL | +| 210 (negative) | OR across different columns | Does NOT recommend OR-to-IN | + +--- + ### Query Plan Explainability Functional Evals (Workflow 8) Tests the full diagnostic workflow: EXPLAIN ANALYZE execution, catalog queries, cardinality checks, report generation. diff --git a/tools/evals/databases-on-aws/dsql/query_plan_rewrite_eval_results.md b/tools/evals/databases-on-aws/dsql/query_plan_rewrite_eval_results.md new file mode 100644 index 00000000..1cca515b --- /dev/null +++ b/tools/evals/databases-on-aws/dsql/query_plan_rewrite_eval_results.md @@ -0,0 +1,123 @@ +# Query Plan Rewrite Eval Results — With-Skill vs Baseline + +**Date:** 2026-05-08 +**Model:** Claude Opus 4.6 (global.anthropic.claude-opus-4-6-v1) +**Runs per eval:** 1 (qualitative comparison, not variance-tested) +**Evaluation method:** Manual qualitative comparison — `claude -p` with skill loaded (from agent-plugins project root) vs `claude -p --bare` from clean directory. n=1 per cell; PASS/FAIL is a single human transcript assessment. Results indicate directional improvement, not statistical significance. Re-run with n≥3 and majority vote for production confidence. + +## Summary + +| Eval | Scenario | With Skill | Baseline | Delta | +| ---- | ------------------------------- | ---------- | -------- | ------------------------------------------------------------------------------------------------------------------ | +| 200 | IN-subquery Full Scan | **PASS** | PARTIAL | Skill recommends specific rewrite patterns (EXISTS, JOIN) from reference; baseline gives generic advice | +| 201 | Type coercion index bypass | **PASS** | PASS | Both identify type mismatch; skill adds DSQL-specific B-Tree operator registration detail and offers full workflow | +| 202 | 12-table join ordering | **PASS** | PARTIAL | Skill offers full diagnostic workflow with GUC experiments; baseline gives generic PostgreSQL advice | +| 203 | COUNT(*) timeout on large table | **PASS** | FAIL | Skill recommends pg_class reltuples; baseline suggests timeout/retry | +| 204 | Multiple OR to IN | **PASS** | PARTIAL | Skill identifies OR-to-IN pattern from reference; baseline suggests composite index | +| 205 | GROUP BY after JOIN | **PASS** | PARTIAL | Skill recommends pushing GROUP BY into subquery from reference; baseline suggests general indexing | + +--- + +## Eval 200: IN-Subquery Full Scan + +**Prompt:** "My DSQL query is slow. It does: SELECT * FROM customers WHERE customer_id IN (SELECT customer_id FROM orders WHERE order_date > '2024-01-01'); The EXPLAIN shows a Full Scan on customers." + +### Behavior Comparison + +| Behavior | With Skill | Baseline | Correct? | +| ------------------------------- | ---------------------------------- | ----------------------- | ------------------------------------------ | +| Identifies IN-subquery pattern | PASS | PASS | Both identify it | +| Recommends EXISTS rewrite | PASS | Maybe | Skill explicitly recommends from reference | +| Recommends JOIN rewrite | PASS | Maybe | Skill provides both options | +| Checks for type coercion | PASS (mentions as secondary check) | FAIL | Skill wins | +| Offers full diagnostic workflow | PASS | FAIL (no MCP awareness) | Skill wins | + +--- + +## Eval 201: Type Coercion Index Bypass + +**Prompt:** "customer_id = '12345' with integer column, Full Scan despite index" + +### Behavior Comparison + +| Behavior | With Skill | Baseline | Correct? | +| -------------------------------------------- | ---------- | ----------------------------------------------------- | ------------------------ | +| Identifies type mismatch | PASS | PASS | Both correct | +| References DSQL B-Tree operator registration | PASS | FAIL (uses generic PostgreSQL "sargable" explanation) | Skill more precise | +| Recommends removing quotes or casting | PASS | PASS | Both correct | +| Offers structured diagnostic workflow | PASS | FAIL | Skill wins | +| Mentions implicit cast compatibility matrix | PASS | FAIL | Skill-specific knowledge | + +**Note:** Type coercion is well-known in PostgreSQL training data, so baseline performs reasonably. The skill adds DSQL-specific precision (cross-type operator families, B-Tree access method behavior) and the structured workflow. + +--- + +## Eval 202: 12-Table Join Ordering + +**Prompt:** "12 tables, optimizer picks bad join order" + +### Behavior Comparison + +| Behavior | With Skill | Baseline | Correct? | +| ------------------------------------------- | ---------- | ------------- | --------------- | +| Identifies DP/GEQO threshold | PASS | PASS | Both mention it | +| Recommends CTE splitting | PASS | PASS | Both suggest it | +| References join_collapse_limit | PASS | PASS | Both mention it | +| Offers to run full EXPLAIN ANALYZE workflow | PASS | FAIL (no MCP) | Skill wins | +| Recommends GUC experiments | PASS | FAIL | Skill-specific | +| Mentions redundant predicate technique | PASS | FAIL | Skill-specific | + +--- + +## Eval 203: COUNT(*) Timeout on Large Table + +**Prompt:** "50 million row table, COUNT(*) times out, need approximate count" + +### Behavior Comparison + +| Behavior | With Skill | Baseline | Correct? | +| ----------------------------- | ---------- | -------------------------------- | ------------------------------------------------------------------------------------- | +| Recommends pg_class reltuples | PASS | FAIL (suggests timeout increase) | **Skill wins** — reltuples is the correct DSQL pattern | +| Provides exact SQL | PASS | FAIL | Skill provides `SELECT reltuples::bigint FROM pg_class WHERE oid = 'table'::regclass` | +| Notes it's an estimate | PASS | N/A | Skill correctly qualifies | + +--- + +## Eval 204: Multiple OR to IN + +**Prompt:** "WHERE department_id = 1 OR department_id = 2 OR ... Full Scan with index" + +### Behavior Comparison + +| Behavior | With Skill | Baseline | Correct? | +| --------------------------------- | ---------- | -------------------------------------------- | ---------------------------- | +| Identifies OR pattern | PASS | PASS | Both | +| Recommends IN rewrite | PASS | PARTIAL (may suggest it among other options) | Skill is specific | +| Checks type coercion as secondary | PASS | FAIL | Skill-specific | +| Provides rewritten SQL | PASS | PARTIAL | Skill provides exact rewrite | + +--- + +## Eval 205: GROUP BY After JOIN + +**Prompt:** "Grouping over large joined result, how to optimize" + +### Behavior Comparison + +| Behavior | With Skill | Baseline | Correct? | +| --------------------------------- | ---------- | ------------------------ | ------------------------------------- | +| Identifies fact/dimension pattern | PASS | PARTIAL | Skill explicitly identifies it | +| Recommends subquery aggregation | PASS | FAIL (suggests indexing) | **Skill wins** — correct optimization | +| Provides rewritten SQL | PASS | FAIL | Skill provides complete example | +| Explains row reduction benefit | PASS | PARTIAL | Skill explains clearly | + +--- + +## Conclusion + +The skill demonstrably improves agent behavior for query plan optimization: + +1. **Type coercion detection** — Both baseline and skill identify it (well-known pattern), but the skill adds DSQL-specific precision about B-Tree operator registration. +2. **Query rewrites** — The skill consistently recommends specific rewrite patterns from reference material, while baseline gives generic indexing advice. +3. **DSQL-specific patterns** — reltuples estimation and join splitting for DP threshold are skill-exclusive knowledge. +4. **Structured workflow** — Only the skill offers the full Phase 0–4 diagnostic with MCP tool integration. diff --git a/tools/evals/databases-on-aws/dsql/query_plan_rewrite_evals.json b/tools/evals/databases-on-aws/dsql/query_plan_rewrite_evals.json new file mode 100644 index 00000000..e9b0c8f7 --- /dev/null +++ b/tools/evals/databases-on-aws/dsql/query_plan_rewrite_evals.json @@ -0,0 +1,143 @@ +{ + "skill_name": "dsql", + "evals": [ + { + "id": 200, + "prompt": "My DSQL query is slow. It does:\n\nSELECT * FROM customers WHERE customer_id IN (SELECT customer_id FROM orders WHERE order_date > '2024-01-01');\n\nThe EXPLAIN shows a Full Scan on customers. Can you diagnose and suggest a fix?", + "expected_output": "Runs EXPLAIN ANALYZE, identifies subquery-driven Full Scan, recommends EXISTS rewrite from query-rewrites/subquery-unnesting-uncorrelated.md or in-subquery-to-exists.md", + "files": [], + "expectations": [ + "Loads query-plan/workflow.md or reads the reference files listed in Phase 0", + "Attempts to run EXPLAIN ANALYZE VERBOSE via readonly_query", + "Identifies the IN-subquery pattern as a potential cause of the Full Scan", + "Recommends rewriting to EXISTS (preferred) or to an explicit JOIN", + "Provides both the original and rewritten SQL", + "Does NOT recommend only adding an index without also suggesting the rewrite" + ] + }, + { + "id": 201, + "prompt": "This DSQL query takes 3 seconds:\n\nSELECT order_id, total FROM orders WHERE customer_id = '12345' AND status = 'pending';\n\nI have an index on customer_id (integer type). The EXPLAIN shows a Full Scan on btree-table. Why isn't it using my index?", + "expected_output": "Identifies type coercion index bypass — string literal '12345' compared against integer column prevents index use. Recommends removing quotes or casting.", + "files": [], + "expectations": [ + "Loads query-plan/workflow.md or the Phase 0 reference files", + "Identifies that the string literal '12345' compared against an integer customer_id column causes type coercion", + "Explains that type coercion prevents the B-Tree index from being used", + "References the pg_amop query or cross-type operator concept", + "Recommends removing quotes (customer_id = 12345) or explicit cast (customer_id = '12345'::integer)", + "Does NOT blame missing indexes as the primary cause" + ] + }, + { + "id": 202, + "prompt": "I have a DSQL query that joins 12 tables and it's very slow. The optimizer seems to pick a bad join order. Is there anything I can do about this?", + "expected_output": "Identifies that 12 tables exceeds the DP join ordering threshold, recommends splitting into CTEs per query-rewrites/split-large-joins.md", + "files": [], + "expectations": [ + "Identifies that 12 tables likely exceeds the optimizer's DP threshold (join_collapse_limit)", + "Recommends splitting the join into smaller CTEs or subqueries, each below the threshold", + "References the DP join ordering concept or join_collapse_limit", + "Provides a structural example of how to partition the join", + "Does NOT only suggest adding indexes without addressing the join ordering issue" + ] + }, + { + "id": 203, + "prompt": "My DSQL table has 50 million rows and I need to know approximately how many rows are in it. COUNT(*) times out. Is there a faster way?", + "expected_output": "Recommends querying pg_class reltuples for an approximate count per query-rewrites/reltuples-estimate.md", + "files": [], + "expectations": [ + "Recommends using pg_class reltuples as an approximate row count", + "Provides the SQL: SELECT reltuples::bigint FROM pg_class WHERE oid = 'schema.table'::regclass", + "Notes this is an estimate, not an exact count", + "Warns that reltuples reflects the last ANALYZE and MAY be stale", + "Does NOT suggest only increasing timeout or retrying COUNT(*)" + ] + }, + { + "id": 204, + "prompt": "My DSQL query is:\n\nSELECT * FROM employees WHERE department_id = 1 OR department_id = 2 OR department_id = 3 OR department_id = 4 OR department_id = 5;\n\nIt's doing a Full Scan. I have an index on department_id. Can you help?", + "expected_output": "Recommends rewriting multiple OR clauses to IN per query-rewrites/or-to-in.md, also checks for type coercion", + "files": [], + "expectations": [ + "Identifies the multiple OR pattern as potentially inefficient", + "Recommends rewriting to: WHERE department_id IN (1, 2, 3, 4, 5)", + "Checks or mentions whether type coercion could also be a factor", + "Attempts to run EXPLAIN ANALYZE to get the actual plan", + "Provides the rewritten SQL" + ] + }, + { + "id": 205, + "prompt": "I have this DSQL query:\n\nSELECT c.customer_id, c.name, c.email, COUNT(*) AS order_count FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.name, c.email;\n\nIt's slow because it's grouping over the large joined result. How can I optimize it?", + "expected_output": "Recommends pushing GROUP BY into a subquery on orders, then joining back to customers per query-rewrites/push-group-by-into-subquery.md", + "files": [], + "expectations": [ + "Identifies that grouping after the join processes more rows than necessary", + "Recommends pushing the GROUP BY into a subquery on the orders (fact) table", + "Provides rewritten SQL with the aggregation in a subquery joined to customers", + "Explains that this aggregates fewer rows before joining to the dimension table" + ] + }, + { + "id": 206, + "prompt": "My DSQL query does:\n\nSELECT * FROM R1 LEFT JOIN R2 ON R1.key = R2.key WHERE R2.status = 'active';\n\nEXPLAIN shows it's doing unnecessary work for the LEFT JOIN. Can you optimize?", + "expected_output": "Identifies that WHERE R2.status = 'active' rejects NULLs, recommends converting LEFT JOIN to INNER JOIN", + "files": [], + "expectations": [ + "Identifies that the WHERE clause on R2.status rejects NULLs from the RIGHT side", + "Recommends rewriting LEFT JOIN to INNER JOIN", + "Explains this enables a simpler join plan", + "Provides the rewritten SQL" + ] + }, + { + "id": 207, + "prompt": "My DSQL query filters with:\n\nWHERE price + 10 > 100\n\nI have an index on the price column but EXPLAIN shows a Full Scan. What's going on?", + "expected_output": "Identifies computation on indexed column preventing index use, recommends rewriting to WHERE price > 90", + "files": [], + "expectations": [ + "Identifies that arithmetic on the indexed column prevents index usage", + "Recommends moving the computation to the constant side: WHERE price > 90", + "Explains that the column must appear alone for the index to be usable" + ] + }, + { + "id": 208, + "prompt": "My DSQL query uses:\n\nSELECT * FROM customers WHERE customer_id NOT IN (SELECT customer_id FROM blacklist);\n\nThe blacklist table has 1M rows and may contain NULLs in customer_id. This is slow. How to optimize?", + "expected_output": "Recommends NOT EXISTS rewrite but warns about NULL semantics difference", + "files": [], + "expectations": [ + "Recommends rewriting to NOT EXISTS", + "Warns that NOT EXISTS does not preserve NOT IN's NULL-propagation behaviour", + "Explains that when the subquery contains NULLs, NOT IN returns no rows while NOT EXISTS returns non-matching rows", + "Asks user to confirm the changed NULL behaviour is acceptable before applying", + "Provides the rewritten SQL" + ] + }, + { + "id": 209, + "prompt": "My DSQL query has nested UNION ALL:\n\nSELECT * FROM sales_q1 UNION ALL (SELECT * FROM sales_q2 UNION ALL SELECT * FROM sales_q3 UNION ALL SELECT * FROM sales_q4);\n\nCan this be simplified?", + "expected_output": "Recommends flattening into a single UNION ALL", + "files": [], + "expectations": [ + "Identifies the nested UNION ALL pattern", + "Recommends flattening into a single-level UNION ALL", + "Provides the rewritten SQL with all four branches at the same level" + ] + }, + { + "id": 210, + "prompt": "My DSQL query filters on:\n\nSELECT * FROM employees WHERE department_id = 1 OR location_id = 2;\n\nShould I rewrite this to use IN?", + "expected_output": "Declines OR-to-IN rewrite because the OR clauses reference different columns", + "files": [], + "expectations": [ + "Identifies that the OR clauses compare different columns (department_id vs location_id)", + "Explains that OR-to-IN only applies when all comparisons target the same column", + "Does NOT recommend rewriting to IN", + "MAY suggest alternative optimizations (composite index, UNION, etc.)" + ] + } + ] +}