Skip to content

Comments

dialect specific function validation#14

Open
archiewood wants to merge 3 commits intotobilg:mainfrom
archiewood:function-validation
Open

dialect specific function validation#14
archiewood wants to merge 3 commits intotobilg:mainfrom
archiewood:function-validation

Conversation

@archiewood
Copy link

@archiewood archiewood commented Feb 18, 2026

#6 adds dialect specific function validation and an implementation for clickhouse functions

  • non blocking just warns (not sure if this is right or should be error)
  • 1700 clickhouse functions generated from select * from system.functions output

it tests:

  • if the function exists (W001)
  • if it has right number of args (W002)
  • correct casing (most clickhouse functions case sensitive) (W003)

notes:

  • i've just inlined the SQL functions into json, and using that to generate the validation/clickhouse.rs file
  • this is a monster PR but ~3,400 lines are the CH function definitions (1700 functions, and then the autogenerated .rs file)

open to other approaches

@archiewood archiewood changed the title feat: function validation dialect specific function validation Feb 18, 2026
@archiewood
Copy link
Author

CleanShot 2026-02-18 at 12 08 32

@tobilg
Copy link
Owner

tobilg commented Feb 22, 2026

Thanks for the PR! Here's some feedback from Codex:

Findings (most important first)

  1. parse_arity_from_syntax currently mis-parses common optional-arg signatures like x[, N], so many arities are wrong.
  • tools/clickhouse-extract/generate-function-catalog.py:68
  • tools/clickhouse-extract/generate-function-catalog.py:73
  • tools/clickhouse-extract/generate-function-catalog.py:82
  • Evidence in generated catalog:
    • crates/polyglot-sql/src/validation/clickhouse.rs:461 has floor as fixed 1 arg (should allow optional second arg).
    • crates/polyglot-sql/src/validation/clickhouse.rs:1124 has roundBankers fixed 1 arg.
    • crates/polyglot-sql/src/validation/clickhouse.rs:934 has parseDateTime64BestEffort fixed 1 arg.
  • I reproduced the parser logic separately: floor(x[, N]) -> (1,1) and parseDateTime64BestEffort(time_string[, precision[, time_zone]]) -> (1,1).
  1. Combinator handling is too permissive and skips validation.
  • crates/polyglot-sql/src/validation/mod.rs:115 only checks base-name existence, not whether base is aggregate.
  • crates/polyglot-sql/src/validation/mod.rs:211 treats combinators as valid and skips arity check entirely.
  • This can miss real errors (false negatives), especially for malformed combinator calls.
  1. New integration tests in this PR are not part of your current strict CI test matrix.
  • PR adds crates/polyglot-sql/tests/function_validation.rs.
  • Current CI (.github/workflows/ci.yml) runs a fixed list of integration tests and does not include function_validation.
  • So this feature could regress without CI catching it.
  1. The PR is currently merge-conflicting with main.

What the PR does well

  • It cleanly hooks into validate(...) and keeps validation non-blocking (warnings only), which matches the PR description.
    • crates/polyglot-sql/src/lib.rs:249
    • crates/polyglot-sql/src/lib.rs:264
  • It implements the three warning classes:
    • unknown function W001
    • arity mismatch W002
    • wrong case W003
    • crates/polyglot-sql/src/validation/mod.rs:207
    • crates/polyglot-sql/src/validation/mod.rs:215
    • crates/polyglot-sql/src/validation/mod.rs:232
  • Catalog size and generation approach are consistent with the stated goal (large CH function coverage).

Does it achieve its stated goal?

Partially. The architecture is sensible for your codebase, but the current arity extraction bug means W002 is not reliable enough yet, and combinator behavior can mask errors.
So the PR does not fully deliver the “right number of args” claim in its current form.

Recommendation before merge

  1. Fix parse_arity_from_syntax for x[, y] and nested optional patterns.
  2. Track is_aggregate in the generated catalog and gate combinators to aggregate bases only.
  3. Add function_validation test execution to CI (and test-rust-verify if you want parity).
  4. Rebase PR dialect specific function validation #14 onto current main and rerun full verification.

archiewood and others added 3 commits February 22, 2026 13:44
- Fix arity parser to correctly handle optional args like x[, N] and
  nested optionals like time_string[, precision[, time_zone]]
- Add is_aggregate field to FunctionSignature and gate combinator
  matching to aggregate base functions only
- Regenerate clickhouse.rs catalog with corrected arities and aggregate tags
- Add function_validation integration test to CI
- Rebase onto main

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants