Skip to content

Comments

Fix comment preservation#3

Open
OTooleMichael wants to merge 3 commits intotobilg:mainfrom
OTooleMichael:fix/comment-preservation
Open

Fix comment preservation#3
OTooleMichael wants to merge 3 commits intotobilg:mainfrom
OTooleMichael:fix/comment-preservation

Conversation

@OTooleMichael
Copy link

There were a few small bugs with comments going missing and one that causes corruption of the SQL (a trailing subquery comment).

I imagine there are more comments going missing, which I also image is always unexpected behaviour, however I just covered a handful of comment locations that are in common locations or may be used for hint directions ala Spark etc.

Three issues fixed:

  1. Line comments (-- ...) were stored as bare text in the tokenizer, causing them to leak as executable SQL during generation. Now normalized to block comment format (/* ... */) at scan time, matching the storage format already used for block comments.

  2. Identifier trailing_comments were discarded in expect_identifier_or_alias_keyword_with_quoted(), which created Identifier { trailing_comments: Vec::new() } instead of preserving the token's trailing_comments.

  3. Comments between ( and SELECT in subqueries were lost because the LParen token's trailing_comments were discarded. Now captured and spliced into the inner Select's leading_comments in both parse_paren() and parse_table_expression().

Three issues fixed:

1. Line comments (-- ...) were stored as bare text in the tokenizer,
   causing them to leak as executable SQL during generation. Now
   normalized to block comment format (/* ... */) at scan time,
   matching the storage format already used for block comments.

2. Identifier trailing_comments were discarded in
   expect_identifier_or_alias_keyword_with_quoted(), which created
   Identifier { trailing_comments: Vec::new() } instead of
   preserving the token's trailing_comments.

3. Comments between ( and SELECT in subqueries were lost because
   the LParen token's trailing_comments were discarded. Now captured
   and spliced into the inner Select's leading_comments in both
   parse_paren() and parse_table_expression().
@OTooleMichael OTooleMichael force-pushed the fix/comment-preservation branch from e0a46a5 to 9c186aa Compare February 16, 2026 22:41
@tobilg
Copy link
Owner

tobilg commented Feb 16, 2026

Thanks, I‘ll have a look!

The LParen trailing_comments were dropped when parsing
parenthesized subqueries in expressions (e.g. WHERE a >= (/* hint */ SELECT ...)).
Splice them into the inner Select's leading_comments, matching
the existing fix for FROM-clause and CTE subqueries.
transform_recursive silently skipped Merge nodes via the catch-all,
meaning transforms (and any AST walks using transform_map) never
descended into MERGE ... ON / USING / WHEN clauses.
@tobilg
Copy link
Owner

tobilg commented Feb 17, 2026

Please run all tests again locally, the CI reports a failing test:

failures:

---- tokens::tests::test_comments stdout ----

thread 'tokens::tests::test_comments' (3081) panicked at crates/polyglot-sql/src/tokens.rs:2765:9:
assertion `left == right` failed
  left: "/* comment */"
 right: "comment"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tokens::tests::test_comments

test result: FAILED. 719 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.32s

error: test failed, to rerun pass `-p polyglot-sql --lib`

All make test-rust-verify tests must pass before I can merge it. Thank you!

@tobilg
Copy link
Owner

tobilg commented Feb 20, 2026

Please retest with the current version v0.1.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants