Skip to content

Comments

ClickHouse dialect: some parser fixes (68.7% → 99.6%)#15

Closed
alexey-milovidov wants to merge 69 commits intotobilg:mainfrom
ClickHouse:clickhouse-dialect-fixes
Closed

ClickHouse dialect: some parser fixes (68.7% → 99.6%)#15
alexey-milovidov wants to merge 69 commits intotobilg:mainfrom
ClickHouse:clickhouse-dialect-fixes

Conversation

@alexey-milovidov
Copy link

@alexey-milovidov alexey-milovidov commented Feb 19, 2026

Summary

Some ClickHouse SQL dialect parsing improvements. Tested against a small subset of ClickHouse test corpus (7,427 .sql files from tests/queries/0_stateless/):

  • Before: ~5,100 / 7,427 files parsed successfully (68.7%)
  • After: 7,401 / 7,427 files parsed successfully (99.6%)
  • +2,301 files fixed

Key changes

  • Keyword-as-identifier support: ClickHouse allows most SQL keywords as identifiers (table names, column aliases, function names). Added support across SELECT, FROM, JOIN, expressions, and DDL contexts.
  • Query parameter syntax: {name:Type} brace parameters in all identifier and expression contexts.
  • Star expression modifiers: * APPLY(func), * EXCEPT(col), * REPLACE(expr AS col) column transformers, including chaining and lambda support.
  • Trailing commas: Tolerant parsing of trailing commas in SELECT lists, tuples, VALUES, and function arguments.
  • ALTER TABLE actions: All ClickHouse ALTER TABLE actions (MOVE PARTITION, FREEZE, ATTACH, DETACH, MODIFY COLUMN, etc.) handled via raw SQL fallback.
  • Special function parsing: Implicit and explicit aliases in function arguments for CAST, SUBSTRING, TRIM, EXTRACT, DATEADD, DATEDIFF, POSITION.
  • DDL extensions: UUID clauses, REFRESH syntax, dictionary SOURCE/STRUCTURE/LAYOUT/LIFETIME blocks, PROJECTION/INDEX in column definitions, STATISTICS column modifier.
  • Statement support: RENAME TABLE, KILL MUTATION, DETACH/ATTACH, OPTIMIZE, EXISTS, UNDROP, DROP IF EMPTY, SHOW CREATE for access control objects.
  • Expression extensions: Ternary operator (expr ? expr : expr), tuple element access (tuple.N, tuple.-N), method call syntax (expr.func(args)), lambda expressions, unary plus.
  • Tokenizer improvements: // comments, # comments, hex integer literals (0xDEADBEEF), numeric identifiers, unicode quote/minus support.
  • INSERT FORMAT data: Skip raw data after INSERT INTO ... FORMAT <name> to avoid parse errors on inline CSV/JSON/TSV data.

Remaining 26 failures (unfixable at parser level)

  • 9 unreadable files (binary/encoding issues)
  • 9 KQL files (Kusto Query Language — completely different language)
  • 5 tokenization errors (unicode whitespace, backslash in FORMAT data)
  • 1 dollar-sign identifiers (requires tokenizer changes)
  • 1 intentionally malformed SQL fuzz test
  • 1 PRQL multi-dialect file

Test plan

  • All 7,427 ClickHouse corpus files tested (99.6% pass rate)
  • Unit tests: 1,102 pass (5 pre-existing failures unrelated to this PR)
  • No compiler warnings
  • No regressions in non-ClickHouse dialect parsing

🤖 Generated with Claude Code

alexey-milovidov and others added 30 commits February 17, 2026 23:04
…E AS, DROP, ALTER, EXPLAIN, hash comments

- Allow ClickHouse SQL keywords (INSERT, DELETE, SET, JOIN, etc.) as identifiers
- Add ALL as JOIN strictness modifier, fix check_join_keyword for GLOBAL/ALL/ANY
- Fix ::Type(args) casts to consume parenthesized args for ClickHouse types
- Support CREATE TABLE t AS other_table ENGINE=... (copy structure)
- Allow trailing commas in column definitions
- Support typeless columns with DEFAULT/MATERIALIZED/ALIAS/EPHEMERAL
- Fix DEFAULT/MATERIALIZED/ALIAS expression parsing to use parse_bitwise
- Add DROP TEMPORARY TABLE, DROP DICTIONARY/USER/QUOTA/ROLE/etc. as Command
- Add ALTER TABLE UPDATE/DELETE/DETACH/ATTACH/etc. as Raw actions
- Support EXPLAIN SYNTAX/AST/PLAN/PIPELINE/ESTIMATE with key=value settings
- Add # as single-line comment character for ClickHouse

Improves ClickHouse test corpus from 68.7% to 77.6% file success rate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… handling

Many token types (Ignore, Domain, Apply, Materialized, Cast, etc.) were
registered in the tokenizer's keyword map but not listed in is_keyword(),
preventing is_safe_keyword_as_identifier() from recognizing them. This
fixes ~142 ClickHouse test files where these keywords are used as identifiers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntexts

Add braced parameter handling to expect_identifier, expect_identifier_with_quoted,
expect_identifier_or_keyword, expect_identifier_or_keyword_with_quoted, and
expect_identifier_or_safe_keyword. This allows ClickHouse query parameters
like {CLICKHOUSE_DATABASE:Identifier} to be used in table names, column names,
and other identifier positions. Fixes ~44 ClickHouse test files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…name)

ClickHouse allows IN and NOT IN with bare table names instead of requiring
parenthesized value lists or subqueries. The IN case was already handled
but NOT IN required parentheses. Now both work without parens.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of checking for specific ClickHouse ALTER TABLE keywords, consume
any unrecognized action as Raw SQL. This handles MOVE PARTITION, FETCH,
APPLY, and other ClickHouse-specific mutations without needing explicit
cases for each. Fixes ~15 more ClickHouse test files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken)

GLOBAL is not registered in the tokenizer's keyword map, so it appears as
TokenType::Var. Changed check_join_keyword and try_parse_join_kind to use
check_identifier("GLOBAL") instead of check(TokenType::Global). Fixes ~14
ClickHouse test files with GLOBAL ANY/ALL LEFT/RIGHT JOIN syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LE actions

- Dictionary SOURCE properties are space-separated, not comma-separated
  (HOST 'localhost' PORT tcpPort() DB 'test')
- Add CHECK TABLE as a command statement for ClickHouse
- Handle ALTER TABLE ADD INDEX/PROJECTION as Raw (CH syntax differs from MySQL)
- Handle ALTER TABLE DROP INDEX/PROJECTION/STATISTICS as Raw
- Handle ALTER TABLE MODIFY (non-COLUMN) as Raw (ORDER BY, TTL, SETTING, etc.)
- Handle ALTER TABLE MODIFY COLUMN as Raw for ClickHouse (supports CODEC, TTL, COMMENT)

Improves test corpus from 5,948 (80.1%) to 6,128 (82.5%) OK files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Recognize (EXPLAIN ...) as subquery in parse_paren, parse_table_expression,
  and statement-level parsing (fixes 77 "Expected RParen, got Eq" errors)
- Fix dictionary property kind parsing to accept keyword tokens (e.g., CACHE,
  not just Var tokens) so LAYOUT(CACHE(...)) works
- Fixes all "Expected dictionary property kind" errors

Improves test corpus from 6,128 (82.5%) to 6,257 (84.3%) OK files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…als, TRUNCATE SETTINGS

- Add RENAME TABLE, OPTIMIZE TABLE, EXISTS as ClickHouse command statements
- Add standalone SETTINGS key=value as ClickHouse statement (fixes ~120 Eq errors)
- Support hex integer literals (0xDEADBEEF) via tokenizer config
- Handle SETTINGS clause after TRUNCATE TABLE
- Result: 6,257 → 6,368 OK files (84.3% → 85.7%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…IERARCHICAL

- Skip colon JSON path extraction for ClickHouse dialect (fixes ternary
  operator inside parentheses when both branches have function calls,
  e.g. (1 ? f(1) : f(2)) - colon was consumed as Snowflake JSON path)
- Allow postfix operators (dot, subscript) on tuple expressions
  (fixes ('a', 'b').2 tuple element access)
- Allow postfix operators on subquery expressions
  (fixes (SELECT 1, 2).1 tuple element access from subqueries)
- Add HIERARCHICAL, IS_OBJECT_ID, INJECTIVE as ClickHouse dictionary
  column attributes in the inline column constraint parser
- Result: 6,368 → 6,396 OK files (85.7% → 86.1%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sing

Array cast like `::Array(Nullable(UInt8))` now works inside parens and
function args. JSON type with ClickHouse-specific subcolumn specs like
`JSON(a String)` is handled in both parse_data_type and
parse_data_type_for_cast.

6,368 → 6,507 OK files (85.7% → 87.6%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- WITH TOTALS now works without a preceding GROUP BY clause
  (e.g., SELECT count() FROM t WITH TOTALS)
- Single-element tuple syntax (1,) is now parsed correctly
- Both in parse_primary's paren handling and parse_paren

6,507 → 6,534 OK files (87.6% → 88.0%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nd empty tuples

- INSERT FORMAT <non-VALUES> (CSV, JSON, TSV, etc.) now skips raw data to semicolon
- Empty parens () parsed as empty tuple or zero-param lambda () -> body

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…der as identifier, dotted DROP COLUMN

- Support unary plus operator (+1, +expr)
- Parse view(SELECT ...) and merge(SELECT ...) as table functions with subquery args
- Handle ClickHouse JSON path syntax: json.^path for nested subcolumns, json.path.:Type for typed access
- Support INSERT INTO t (*) and INSERT INTO t (* EXCEPT (col)) syntax
- Add TokenType::Order to is_keyword() so 'order' can be used as identifier
- Handle dotted column names in ALTER TABLE DROP COLUMN (e.g., n.ui8)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dentifier

- Parse STATISTICS(tdigest, minmax, uniq, ...) in column definitions
- Revert adding Order to is_keyword() as it caused ORDER BY to be consumed
  as implicit alias, breaking PROJECTION syntax (38 file regression)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…() in func args, DESC subquery, array aliases

- Accept keyword identifiers (default, system) in USE statements
- Handle WITH (tuple) AS alias where AS is consumed by tuple alias handler
- Skip LAMBDA keyword parsing for ClickHouse (lambda is a function name, not keyword)
- Allow * in GRANT securable names (db.*, *.*)
- Parse bare SELECT/WITH as function arguments for view(SELECT ...) inside remote()
- Support DESC/DESCRIBE (subquery) syntax
- Handle AS aliases inside array literals [1 AS a, 2 AS b]

Corpus: 6,744/7,432 (90.7%), up from 6,667 (89.7%). +77 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iases in func args, map subscript, PRIMARY KEY without parens, CONSTRAINT ASSUME

- Support dotted column names in INSERT identifier lists (n.a, n.b)
- Add NANOSECOND/NANOSECONDS interval unit across parser, generator, and all dialects
- Handle ORDER BY (col DESC) in engine properties with proper ASC/DESC/NULLS parsing
- Add maybe_clickhouse_alias helper for AS alias in typed function args (SUM, ABS, LOWER, etc.)
- Fix map[key] subscript access in ClickHouse (don't treat as MAP constructor)
- Support PRIMARY KEY col without parentheses in column definitions
- Add CONSTRAINT ... ASSUME support (stored as CHECK constraint)

Corpus: 6,859/7,429 (92.3%), up from 6,744 (90.7%). +115 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iff tz, empty func args, braced param dot, CREATE VIEW types, REPLACE TABLE

- Route RENAME through command handler (was Teradata-only)
- KILL MUTATION/QUERY parsed as command for ClickHouse
- DETACH TABLE IF EXISTS ON CLUSTER parsed as command
- Fix nested tuple expressions ((1,2),(3,4))
- dateDiff allows optional 4th timezone argument
- if()/locate() allow zero arguments
- {param:Identifier}.column dot access after braced parameters
- CREATE VIEW with typed column list
- REPLACE TABLE routes to CREATE OR REPLACE TABLE

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…names, LIMIT WITH TIES, ORDER/GROUP BY aliases, POPULATE AS, dictionary PRIMARY KEY, CAST alias, VALUES trailing comma

- Add Order back to is_keyword() so it can be used as table/column name
- Handle {param:Identifier}.table in FROM clause table expressions
- Support dotted column names (n.b) in ALTER TABLE ADD COLUMN
- Parse LIMIT ... WITH TIES (consume WITH TIES after LIMIT)
- Allow ORDER BY expr AS alias for ClickHouse
- Allow GROUP BY expr AS alias for ClickHouse
- Consume POPULATE keyword before AS in materialized views
- Dictionary PRIMARY KEY with comma-separated keys (no parens)
- CAST(expr AS alias AS Type) inner alias syntax
- Trailing comma in VALUES list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, args), ternary in CAST

- EXISTS [TEMPORARY] TABLE/DATABASE/DICTIONARY as command statement
- Second LIMIT after LIMIT BY: SELECT ... LIMIT n BY expr LIMIT m
- Fix CAST((1,2) AS String) - detect simple type after AS in tuple context
- Allow star followed by more args in functions: ignore(*, col1, col2)
- Support ternary operator inside CAST: CAST(cond ? val : val AS Type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… neg values, DECIMAL neg scale, DROP PARTITION

- Table-level and column-level CHECK constraints without parentheses
- EXTRACT(func(args), pattern) parsed as regular function when first arg is a function call
- Column DEFAULT/MATERIALIZED/ALIAS now use parse_or() to handle ==, comparisons, etc.
- Enum type definitions support negative value assignments: Enum8('a' = -1000)
- DECIMAL(precision, -scale) negative scale handled in expect_number()
- DROP PARTITION routed as command statement
- ORDER BY AS alias no longer consumes AS SELECT/WITH

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…syntax, CHARACTER LARGE OBJECT

- Allow union/except/intersect as table names when not followed by ALL/DISTINCT/SELECT
- Dictionary column EXPRESSION expr modifier in parse_column_def
- ClickHouse REFRESH AFTER/EVERY syntax for materialized views (skip tokens)
- CHARACTER LARGE OBJECT → Text data type
- ORDER BY AS alias: exclude AS SELECT/WITH to avoid consuming CREATE TABLE AS

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…defs, REFRESH syntax

- Handle MINUS/EXCEPT/INTERSECT tokens followed by ( as function calls in ClickHouse
- Skip Except/Intersect in select-expression stop conditions when followed by LParen
- Dictionary column EXPRESSION expr modifier
- ClickHouse REFRESH AFTER/EVERY consumed as raw tokens for materialized views
- CHARACTER LARGE OBJECT → Text type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…JECT type

- In parse_select_expressions(), ORDER BY was being consumed as an
  implicit alias because ORDER is a keyword allowed as identifier in
  ClickHouse. Added ORDER BY text sequence check alongside existing
  GROUP BY check to prevent this.
- Added BINARY LARGE OBJECT → Blob data type mapping in parse_data_type(),
  matching the existing CHARACTER LARGE OBJECT → Text handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…R dotted columns, WITH ROLLUP/CUBE without GROUP BY

- Allow trailing commas in multi-element tuples: (1, 2,) now parsed correctly
- Allow FIRST and LAST keywords as implicit table aliases in FROM clause
  for ClickHouse (e.g., FROM t1 first JOIN t2 ON ...)
- Handle dotted column names in ALTER TABLE AFTER clause (e.g., AFTER n.a)
- Support WITH ROLLUP and WITH CUBE without GROUP BY in ClickHouse mode,
  matching existing WITH TOTALS handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Handle UUID 'xxx' clause in CREATE TABLE after table name
- Handle UUID 'xxx' clause in CREATE VIEW/MATERIALIZED VIEW after view name
- UUID value is consumed and ignored (not stored in AST) since it's
  ClickHouse-specific metadata

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… CHAR, keyword identifiers in EXCEPT, DISTINCT/ALL in aggregates

- Fix EXPLAIN QUERY TREE to consume both QUERY and TREE as style
- Handle dotted column names in RENAME COLUMN (n.x TO n.y)
- Add NATIONAL CHAR/CHARACTER/CHARACTER VARYING type parsing
- Allow keyword identifiers (key, index, etc.) in * EXCEPT clauses
- Add DISTINCT support in countIf() aggregate function
- Add ALL quantifier support in COUNT/SUM/AVG/MIN/MAX/etc. aggregates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…N CLUSTER, OVERLAY 2-arg, APPLY without parens

- Fix SHOW CREATE TABLE/VIEW/DICTIONARY to parse qualified db.table names
- Handle EXISTS((SELECT ...)) with double parentheses
- Add ON CLUSTER clause to DROP TABLE/VIEW/DATABASE
- Support 2-argument OVERLAY function call
- Allow * APPLY func (without parens) column transformer syntax

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, AS in COUNT, SETTINGS in columns

- APPLY with lambda expressions: * APPLY (x -> x + 1) now parses full expressions
- Trailing commas in identifier and expression lists (INSERT column lists, IN lists)
- EXCEPT/EXCLUDE with string literals for regex column matching
- TTL per-clause WHERE: consume WHERE condition attached to each TTL action
- AS alias inside COUNT function arguments (count(NULL AS a))
- SETTINGS clause in column definitions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Y, Greatest/Least

- Allow zero-argument function calls for GREATEST/LEAST
- Add SETTINGS clause support to SHOW statements (SHOW TABLES SETTINGS ...)
- Handle empty USING () clause in JOINs
- Handle empty PRIMARY KEY () in CREATE TABLE
- Add parse_clickhouse_settings_clause helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alexey-milovidov and others added 25 commits February 18, 2026 23:19
…ng VALUES comma

- Allow `except` as column name and function argument in ClickHouse expressions
- Fix EXCEPT-followed-by-comma being treated as trailing comma in SELECT
- Support ALIAS/MATERIALIZED/EPHEMERAL column modifiers in CREATE VIEW schema
- Allow trailing comma after last tuple in INSERT VALUES
- Exclude ALIAS/EPHEMERAL/MATERIALIZED from data type parsing in ClickHouse

7325/7428 (98.6%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r, RLike as keyword

- Add // as line comment in ClickHouse mode (reuses hash_comments flag)
- Add RLike to is_keyword() so REGEXP can be used as function name in engine args
- Handle EPHEMERAL expr Type syntax (type follows expression)

7329/7427 files passing (98.7%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Parse OVERLAY with any number of comma-separated args in ClickHouse mode
- Allow expressions in CAST(expr, type_expr) second argument (e.g., 'Str' || 'ing')

7331/7427 files passing (98.7%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…args

- Fix GROUPING SETS parsing: only match GROUPING when followed by SETS
  (previously consumed GROUPING as identifier unconditionally, breaking
  GROUP BY grouping as column name)
- Allow ntile() to accept extra comma-separated args in ClickHouse mode

7332/7427 files passing (98.7%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n schema, ntile extra args

- Fix GROUPING SETS lookahead to not consume GROUPING unless SETS follows
- Allow multiple semicolons between statements (e.g., `;;`)
- Add INDEX and PROJECTION handling in schema parsing (for CREATE MATERIALIZED VIEW)

7333/7427 files passing (98.7%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lists

- Fix TIMESTAMP followed by WITH: lookahead for TIME before consuming WITH
  (prevents TIMESTAMP WITH FILL FROM being parsed as TIMESTAMP WITH TIME ZONE)
- Allow any keyword after dot in identifier lists (e.g., replace.from)

7335/7427 files passing (98.8%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ARRAY JOIN empty, EXECUTE AS, CHECK subquery

- PRIMARY KEY key (Key token as identifier) in CREATE MV schema
- PRIMARY KEY (t.a) dot expressions in primary key
- PROJECTION name INDEX expr TYPE type_name (new syntax)
- INDEX with comparison expressions (c0 < subquery)
- LIMIT randConstant() % 2 in subquery (fix % treated as PERCENT)
- ARRAY JOIN with no args (empty expression list)
- EXECUTE AS username statement (ClickHouse impersonation)
- ALTER TABLE ADD CONSTRAINT CHECK (SELECT 1)

7,343/7,427 files (98.9%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s, negative dot access, DROP TEMPORARY VIEW, PROJECTION WITH SETTINGS, UNDROP command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip statements annotated with -- { clientError ... } since these are
intentional syntax error tests that ClickHouse's own parser also rejects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…support

Handle ClickHouse-specific column transformers in SELECT expressions:
- COLUMNS(id, value) EXCEPT (id) REPLACE (5 AS id) APPLY toString
- * APPLY(toDate) EXCEPT(i, j) APPLY(any)
- a.* APPLY(toDate) EXCEPT(i, j) APPLY(any) (qualified star)
- Any combination/ordering of APPLY, EXCEPT, REPLACE modifiers

Fixes parsing in both the primary expression path (for table.* qualified stars)
and the SELECT expression loop (for COLUMNS functions and unqualified stars).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…iases

- Tokenizer: support hex float literals with binary exponent (0x123p4,
  0x1p-1022, 0x1.fffffffffffffp1023) — fixes 00031, 02896, 03747
- Parser: handle WITH ((SELECT 1) AS x, (SELECT 2) AS y) SELECT syntax
  for ClickHouse tuple CTE pattern — fixes 01461, 01651, 03808
- Parser: handle AS alias in nested paren tuple expressions
  ((expr1 AS a, expr2 AS b)) for comma-separated aliased expressions
- Parser: handle comma-separated elements in nested paren subquery context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Skip Oracle pseudocolumn parsing (LEVEL, ROWNUM, etc.) for ClickHouse
  dialect so these work as regular identifiers in lambda expressions
  like `level -> least(1.0, ...)` in WITH clauses
- Support bare VALUES without parentheses in INSERT: `VALUES 1, 2`
  (ClickHouse allows omitting parens around single values)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Keyword -> body AS alias in WITH clause (e.g., time -> sin(time * 2) AS f)
- Tuple lambda with keyword params: (from, to, wave, time) -> body AS alias
- Lambda inside parentheses: (x -> body) without closing paren first
- Structural keywords as identifiers in expression context when followed by
  operators (e.g., from + 1, on.col)
- Disable -> as JSON extract in ClickHouse (uses -> for lambda exclusively)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s, disable -> JSON extract

- Support * in USING clause for ClickHouse joins
- Structural keywords (FROM, ON, JOIN, etc.) treated as identifiers in
  expression context when followed by non-clause tokens
- Expanded expression context detection with comparison/logical operators

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Empty tuple () now supports postfix operators like .-1 (negative index)
- FORMAT <name> in SELECT clause now consumes inline data (CSV, JSON, etc.)
  to semicolon, fixing INSERT...SELECT...FORMAT with inline data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Allow any keyword as table alias when AS is explicit (e.g., AS select)
- Fix EXTRACT() to detect keyword-named functions as first arg (e.g., extract(identity(...), pattern))
- Extend expect_identifier_or_alias_keyword_with_quoted for ClickHouse keywords

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e, REPLACE fix

- Parse subquery column alias lists without AS: FROM (...) (c0, c1)
- Handle `from` keyword as column name in SELECT list with operator whitelist
- Fix trailing comma + FROM keyword interaction for keyword table names like system
- Route ClickHouse multi-table, wildcard, and REPLACE OPTION GRANT to command parsing
- Fix REPLACE without parens to parse single entry (comma separates select items)
- FROM FROM pattern: two consecutive FROM tokens, first is column name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ATCH function fix

- REVOKE: detect multi-ON, wildcard names (like GRANT fix) and route to command parsing
- EXPLAIN: increase nested paren lookahead from 20 to 100 for deeply nested queries
- MATCH: gate SingleStore TABLE syntax to non-ClickHouse dialects so match(table, pattern) works

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…H APPEND TO, OVER WITH FILL

- Fix EXCEPT without parens consuming commas into non-identifier tokens (lambda body)
- Allow AS alias in expression lists to continue with operators (e.g., blockSize() AS bs < 1000)
- Handle TO destination_table after REFRESH ... APPEND clause in materialized views
- Parse column definitions after REFRESH APPEND TO table in materialized views
- Add EMPTY keyword handling before AS in materialized views
- Add WITH FILL support in OVER() window ORDER BY clauses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… in parens

- Use parse_or() instead of parse_addition() in WITH FILL FROM/TO/STEP/STALENESS
  to support parenthesized aliases like ((1+1) AS from)
- Accept keyword aliases (e.g., AS from) in parenthesized alias expressions
- Fixes window OVER() ORDER BY WITH FILL with aliased values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lumn defs

- Handle STRUCTURE (...) in dictionary SOURCE settings by consuming balanced
  parentheses containing space-separated column definitions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Handle operators (IS, AND, OR, comparisons, arithmetic) after star
  expressions in SELECT lists for ClickHouse dialect
- Fixes patterns like SELECT *, * IS NOT NULL and SELECT * AND(16)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Detect COLUMNS function via MethodCall and Columns expression types
  for EXCEPT/REPLACE/APPLY column transformer handling
- Fixes t.COLUMNS('^c') EXCEPT (col1, col2) patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, SUBSTRING, TRIM, EXTRACT, DATEADD, DATEDIFF, POSITION)

Support ClickHouse's alias syntax in function arguments: both implicit
(`expr identifier`) and explicit (`expr AS identifier`) forms. This
allows parsing patterns like `cast('1234' lhs AS UInt32)`,
`substring('1234' lhs FROM 2)`, `dateAdd(DAY, 1 arg_1, date arg_2)`.

Added two helper methods:
- try_clickhouse_implicit_alias: for CAST and parse_function_arguments
- try_clickhouse_func_arg_alias: for SUBSTRING, TRIM, EXTRACT, etc.

Also extended EXTRACT's ClickHouse function dispatch to handle string
and number first arguments with implicit aliases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update test_missing_select_keyword: parser correctly handles `* FROM users` as star + FROM-first query
- Update test_trailing_comma_in_select: parser tolerates trailing comma before FROM
- Fix unused variable warning for paren_depth in REPLACE DICTIONARY parser

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov changed the title ClickHouse dialect: comprehensive parser fixes (68.7% → 99.6%) ClickHouse dialect: some parser fixes (68.7% → 99.6%) Feb 19, 2026
@tobilg
Copy link
Owner

tobilg commented Feb 20, 2026

I merged this on my machine, because I had a lot of merge conflicts with my local development branch. It now uses 7047 test files, I filtered out 51 tests after conversations with Claude Code & Codex. We have a 100% pass rate for the parser tests:

Bildschirmfoto 2026-02-20 um 14 03 53

There are coverage tests now as well with regeneration, currently at around 61.9%.

Thank you very much for your support!

@tobilg tobilg closed this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants