fix: reimplement MySQL type compatibility for 3.0-dev#24204
fix: reimplement MySQL type compatibility for 3.0-dev#24204ck89119 wants to merge 53 commits intomatrixorigin:3.0-devfrom
Conversation
After rewriting duplicate-entry errors to surface user-visible column or index names in the Optimistic txn path, several existing fixtures in test/distributed/cases/optimistic/ still expected the internal __mo_index_idx_col placeholder. Update the expected messages to the real column name (e.g. 'b', 'col2') or index name (e.g. 'm1', 'm1_index', 'm2_index') for each failing row, matching MySQL semantics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…elpers
Coverage CI rejected the branch at 67% (threshold 75%) because several
files touched by this PR only had happy-path tests. Add focused unit
tests for the branches we introduced:
- arithmetic_float_overflow: cover sub/mul/div float32/float64 helpers
(happy path and infinity overflow).
- func_cast: cover floatToYear rounding (2155.6 -> rejected, 69.4 ->
2069), NaN rejection, and null propagation.
- build_util: cover SET default / on-update with constant-foldable but
non-literal expressions (concat('read','')) and the error path when
the folded literal is not a valid SET member.
- make.go: cover every literal shape funcCastForSetType switches on,
plus the non-set-type pass-through and the null-literal branch.
- decimal: cover the int32 scale-subtraction overflow guard via
"0.1e2147483600".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coverage CI is still below 75% (73.6% last run); the low-coverage files are all ones this PR touched. Add targeted UTs: - arithmetic.go: drive minusFn / multiFn / divFn overflow paths (float32 and float64) through NewFunctionTestCase so the operator-level branches that call our *WithOverflowCheck helpers are exercised end-to-end. - func_cast.go: cover integerToYear at every int width (int8/16/32/64), null-integer propagation, and the yearToStr happy + null paths. - fuzzyCheck.go: exercise the compound primary key, the onlyInsertHidden (hidden-index alter) shortcut, the empty-name error branch, and reset/clear/TypeName lifecycle helpers. - build_util.go: cover the nil-default, JSON-default-rejection, NOT-NULL-with-null-default-rejection, and uuid()-on-non-UUID-rejection branches of buildDefaultExpr, plus buildOnUpdate's no-attribute case. - decimal.go: cover Parse128/Parse256 leading-whitespace / signed-prefix and hex-literal branches to reach the previously uncovered paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts: # pkg/sql/parsers/dialect/mysql/mysql_sql.go # pkg/sql/parsers/dialect/mysql/mysql_sql_test.go
# Conflicts: # pkg/sql/parsers/dialect/mysql/mysql_sql.go
# Conflicts: # test/distributed/cases/operator/is_operator.result
…rors Single-column hidden unique indexes already reported the user-visible column name, but the composite path in newFuzzyCheck and constructFuzzyFilter still left displayAttr / PkName at the default PkeyColName — which for a hidden index table is __mo_index_idx_col. Mirror what bind_insert.go already emits for DedupColName on compound uniques and build a "(col1,col2,...)" tuple from Fuzzymessage.ParentUniqueCols. OriginTableMessageForFuzzy does not carry the user-declared index name, so a tuple is the closest recognizable form we can produce without a proto extension. Tests: TestNewFuzzyCheckCompositeUniqueIndexDisplayAttr and TestConstructFuzzyFilterCompositeParentUniqueCols. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…W CREATE
Two distinct SHOW CREATE TABLE output bugs handled together:
1. Member-name casing. The column loop lowered the whole typeStr, so
SET('Read','Write') came out as set('read','write') — silently renaming
the SET members. Follow the ENUM branch already in place and lower only
the leading "SET" keyword, keeping the members intact.
2. Non-literal DEFAULT expressions. formatStr() unconditionally doubled
every embedded single quote when the value was not surrounded by quotes.
For a function-call default like concat('read','') the OriginString is
concat('read',''); doubling the interior quotes rewrote it to
concat(''read'',''') and broke round-trip. Emit such values verbatim
(backtick escaping stays) and only escape quotes inside a real string
literal.
Tests: TestFormatStrPreservesNonLiteralDefault and
TestShowCreateSetMemberCasePreservation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
XuPeng-SH
left a comment
There was a problem hiding this comment.
PR #24204 Review: Reimplement MySQL Type Compatibility for 3.0-dev
Size: +7117 / -3534 across ~50 files
Overview
This PR adds MySQL-compatible support for several type system features:
- YEAR type (1901-2155, two-digit mapping, int16 storage)
- SET type (bitmask stored as uint64, up to 64 members)
- GEOMETRY subtypes (POINT, LINESTRING, POLYGON, etc.)
- Decimal256 (precision up to 65 digits, scale max 30)
- Float overflow detection (arithmetic +/-/*/div now rejects Inf)
- Duplicate-entry error rewriting (shows user-visible column names instead of
__mo_index_idx_col)
Strengths
- Excellent test coverage. Every new feature has comprehensive unit tests: cast paths (all integer widths, float rounding, null propagation, overflow rejection), SET member resolution, YEAR range validation, Decimal256 round-trips, float overflow detection, and dup-entry rewriting.
- SET type design is solid.
sync.Mapdescriptor caching, JSON encoding for sets with empty members, bitmask storage as uint64. ThebuildDefaultExpr/buildOnUpdatepaths correctly fold non-literal expressions before SET member resolution. - Float overflow checks are well-placed. Both scalar and array arithmetic paths detect Inf and return
ErrOutOfRange.Exp()also catches overflow. - Dup-entry rewriting is clean. The regex-based approach with the safety guard for multiple unique indexes (falls back to original error) is the right tradeoff.
Issues
High Priority
-
decimal256ToFloathas a suspicious double-parse (func_cast.go, newdecimal256ToFloatfunction):result, err := strconv.ParseFloat(xStr, bitSize) if err != nil { return moerr.NewOutOfRangef(ctx, "float32", ...) } if bitSize == 32 { result, _ = strconv.ParseFloat(xStr, 64) // re-parses as float64? }
When
bitSize == 32, the first parse validates as float32, then it re-parses as float64 — discarding the error. This is copy-pasted fromdecimal128ToFloatwhich has the same pattern, but it looks like a bug in both: for float32 targets you should keep the float32-precision value, not widen to float64. TheT(result)cast at the end narrows it back, but the intermediate float64 value may differ from the float32 parse. -
SET type uses
T_uint64as storage OID. This means anyuint64column with non-emptyEnumvaluesis treated as a SET. This is fragile — if any other codepath accidentally setsEnumvalueson a uint64 column, it would be misidentified. Consider using a dedicated type OID (likeT_set) or at minimum adding a more explicit discriminator. -
Decimal256cast matrix is very limited — onlyfloat32andfloat64are supported targets:types.T_decimal256: { types.T_float32, types.T_float64, },
Missing:
T_char,T_varchar,T_text,T_decimal64,T_decimal128, integer types. Users will get "unsupported cast" errors for common operations likeCAST(decimal256_col AS VARCHAR)orSELECT decimal256_col + int_col. This will likely cause issues in practice. At minimum, string casts should be supported. -
RewriteHiddenIndexDupEntrywithlen(uniques) != 1guard — when a table has multiple unique indexes, the error falls back to the raw__mo_index_idx_colmessage. This is better than misattributing, but could be improved by extracting the duplicated value and matching it against each index's column set. Worth a TODO.
Medium Priority
-
formatStrbehavior change (build_show_util.go) — the function previously escaped all interior single quotes. Now it only escapes quotes inside string literals (detected by leading/trailing'). This is correct for expression defaults likeconcat('read',''), but verify there's no codepath that passes a bare string value (not wrapped in quotes) that contains an apostrophe — those would now pass through unescaped. -
Parse256fallthrough inmake.go— whenParse128fails, the code falls through toParse256. ButParse128can fail for reasons other than value too large (e.g., malformed input). In that case,Parse256will also fail and return its own error, which may be confusing. Consider checking the error type fromParse128before falling through.
Low Priority
-
Geometry subtypes are all mapped to the same OID (
MYSQL_TYPE_GEOMETRY), differentiated only byFamilyString/Enumvalues. Storage layer treats POINT and POLYGON identically. Fine for 3.0-dev scope but should be documented. -
NormalizeSetValuesusessync.Mapfor caching parsed descriptors. The cache never shrinks. Not a practical concern given set definitions are per-column and relatively few.
Summary
Well-structured, thoroughly-tested PR that adds significant MySQL type compatibility. Main concerns are: (1) the very limited Decimal256 cast matrix which will hit users in practice, (2) the fragile SET type detection via T_uint64 + Enumvalues, and (3) the suspicious double-parse in decimal256ToFloat. The float overflow detection and dup-entry rewriting are clean wins. Test coverage is notably strong across the board.
…56 cast matrix Two related decimal cast fixes: 1. decimal128ToFloat / decimal256ToFloat double-parse. When bitSize was 32 the helper called strconv.ParseFloat at 32 (for validation), then re-parsed at 64 and widened — throwing away the float32 quantization the caller asked for. For an input like "3.14159265" this gave a different nearest-representable result than a direct float32 parse. Parse only once at the target bit size. decimal256ToFloat copy-pasted the bug and is fixed the same way. 2. Decimal256 cast matrix was limited to float32/float64, so common SQL like CAST(decimal256_col AS VARCHAR) or an implicit decimal256 -> int would fail with "unsupported cast". Add helpers that mirror the decimal128 ones (signed/unsigned ints, bit, decimal64/128/256, and str/blob/binary/text) and extend the routing table plus the dispatch switch in decimal256ToOthers. Tests cover both the float-precision contract and the new routing paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
makePlan2DecimalExprWithType fell through to Parse256 whenever Parse128
returned any error. For genuinely-malformed input like "not-a-number",
the user then saw the Decimal256 parser's error message instead of the
original illegal-string diagnostic — a confusing outcome since the
problem was syntactic, not range-related.
Only retry with Parse256 when Parse128's failure looked like a width
overflow ("beyond the range"). Otherwise return the 128-bit error to
the caller. Adds TestMakePlan2DecimalExprWithTypeMalformedInputNotRecovered.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Record intent next to the three review points we deliberately deferred: - dup_entry_rewrite.go: multi-unique-index fallback to the raw __mo_index_idx_col message is a known trade-off; add a TODO for a value-type-matched future implementation. - mysql_special_types.go: isSetPlanType still piggy-backs on T_uint64 + non-empty Enumvalues. Note the migration path to a dedicated T_set OID; geometry subtypes share one OID and is called out separately. - set.go: setDescriptorCache is unbounded on purpose because SET definitions are few and per-column; document this so future readers don't mistake it for an oversight. No behavioral change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…range check
The "single-parse at target precision" simplification in the previous round
was incorrect. The old pattern —
result, _ := strconv.ParseFloat(xStr, 32) // magnitude validation
result, _ = strconv.ParseFloat(xStr, 64) // feed full precision below
— was deliberate: floatNumToFixFloat does a Width/Scale boundary check that
must see the original decimal literal, not a float32-quantized value.
Parsing 999.995 at float32 gives ~999.9949951171875. floatNumToFixFloat then
computes v = 999 + Round(0.9949951171875 * 100) / 100 = 999.99, which is
≤ max_value 999.99, so the insert silently succeeds. With the float64 parse
reinstated, v = 1000.00 > 999.99 and the cast reports out-of-range as MySQL
does.
This fixes the float.test BVT regressions that expected
"Data truncation: data out of range: data type float, value '999.995'" on
INSERT decimal(30,3) 999.995 INTO float(5,2).
Replaces the old unit test (which locked in the wrong contract) with a
boundary-range test on both decimal128 and decimal256 paths.
…Entry Single-column UNIQUE hidden index tables use __mo_index_idx_col as their PK column name; the regex matched that and rewrote engine-level dup errors with the user-visible column name. Composite UNIQUE, though, stores the hidden index as a compound PK whose column is __mo_cpkey_col (not __mo_index_idx_col), so composite dup errors leaked the internal name. __mo_cpkey_col is also used by regular user tables with composite PKs, so we can't unconditionally rewrite it — a PK duplicate on a user table would otherwise be mislabelled as a unique-index hit. Only rewrite when the plan also contains a hidden-index-table target node (prefix __mo_index_). Tests cover both the composite-hidden-UNIQUE rewrite path and the user-table composite-PK path that must stay unchanged.
moarray.Cast used O(in[i]) for every element, so a narrowing cast like vecf64 -> vecf32 on a finite value outside float32's range became +/-Inf without any error. Guard against that: if the source element is finite but the converted result is Inf, return an out-of-range error matching the behaviour of scalar float casts. Explicit +/-Inf on the input still passes through.
…e comments The old formatStr unconditionally doubled backticks and then had a has-outer-quotes heuristic to decide whether to escape single quotes too. Both behaviours were wrong for free-form SQL text: - A DEFAULT expression like concat(`col`, 'x') is not wrapped in a backtick identifier, so doubling its interior backticks corrupted the SQL (concat(``col``, 'x')). - Column/table COMMENT strings were concatenated raw into 'X' literals, so a comment with a single quote produced parse-invalid output (COMMENT 'he's ok'). Replace formatStr with two targeted helpers: formatIdent(s) — for contents of `...` (doubles backticks). formatStrLit(s) — returns '...'-quoted literal with ''-escaped quotes. All backtick-wrapped name contexts now use formatIdent. Column, table, and index COMMENT contexts now use formatStrLit (and the old ad-hoc \' escaping for index comments is dropped). DEFAULT expression OriginString is emitted verbatim; the parser already validated it.
… ->bit
overflowForNumericToNumeric only enforced the positive bound on float
narrowing and float->integer casts, so:
- float64 -> float32 with value -1e300 became -Inf instead of erroring,
- float -> intN/uintN with a large-negative or NaN source produced a
platform-dependent magic int (Go's float->int conversion of an out-of-
range or NaN value is implementation-defined),
The numericToBit path was even looser: uint64(math.Round(v)) was applied
unconditionally to float sources, so NaN silently became 0, -1.0 became
0xffffffffffffffff, and +Inf overflowed into an arbitrary bit pattern.
Refactor the float -> integer branches through a single
checkFloatForInteger helper that rejects NaN, +/-Inf, and values rounding
outside the target range (both sides). Mirror the fix for the
float64 -> float32 branch. Add NaN/Inf/negative/above-uint64 guards in
numericToBit's float branch.
Tests exercise negative overflow, NaN, +/-Inf, and negative-to-BIT.
…idden path For CREATE UNIQUE INDEX on an existing table, the DML plan writes directly into the hidden index table with Fuzzymessage = nil. newFuzzyCheck took that branch and left displayAttr at the hidden table's PK column name — __mo_index_idx_col for a single-column UNIQUE, __mo_cpkey_col for a composite one. firstlyCheck and backgroundSQLCheck both call NewDuplicateEntry(entry, f.displayAttr) directly, bypassing the RewriteHiddenIndexDupEntry rewrite layer, so those internal names leaked straight to the user. We do not have the user-declared unique-index name at this point (the plan hid it because we are inserting into the hidden table alone), so pick a neutral placeholder: "unique index". Users still see a clear constraint- violation message; no internal catalog column name is exposed. Tests cover both single-column and composite hidden-only fallback paths.
The EXTERNAL TABLE branch built its INFILE{} key/value pairs and the
FIELDS/LINES clauses with raw fmt.Sprintf("...'%s'..."), so any user-
supplied value containing a single quote (CompressType, Format, JsonData,
FIELDS TERMINATED BY, ENCLOSED BY, ESCAPED BY, LINES STARTING BY, LINES
TERMINATED BY) produced unparseable SHOW CREATE output.
Route every one of those values through formatStrLit so single quotes are
doubled per the MySQL string-literal rule, and fix an incidental bug where
LINES TERMINATED BY printed the whole tree.Terminated struct instead of
its .Value. The byte-level NUL/backslash handling in the existing escape
closure is unchanged.
Column and table comments are stored in pre-doubled MySQL string-literal form. AttributeComment and TableOptionComment productions in mysql_sql.y both pass the parsed STRING through util.DealCommentString, which already duplicates every embedded single quote. The earlier formatStrLit wrap then doubled them again, so a comment like "%$^&*()_+@!'" (one quote) came out of SHOW CREATE as COMMENT '"%$^&*()_+@!''''' (five trailing quotes) instead of the expected COMMENT '"%$^&*()_+@!''' (three trailing quotes). This regressed test/distributed/cases/comment/comment.sql and test/distributed/cases/dml/show/show.test in BVT. Revert the two comment sites to raw 'col.Comment' / 'kv.Value' concatenation, which is correct given the pre-escape invariant. Index comments stay on formatStrLit (the parser does not DealCommentString them).
The coverage job failed at 74.625%, just below the 75% threshold, with
the biggest deficits in pkg/sql/plan/function/func_cast.go (59.3%),
pkg/sql/plan/function/arithmetic.go (75%), and
pkg/sql/plan/build_util.go (74%). The gaps were all in newly-added code
that had happy-path tests but no error-branch coverage.
Add targeted tests:
- TestDecimal256ToOthersDispatcher drives every arm of the
decimal256ToOthers switch (including the unsupported-target default),
the null-input path through each helper, the ParseInt/ParseUint
out-of-range branches, and the Width-bound rejection inside
decimal256ToStr. This lights up the 1669-1716 dispatcher table plus
the unreachable-by-happy-path error arms in each helper.
- TestSetDefaultAndOnUpdateErrorBranches covers the SET branches in
buildDefaultExpr / buildOnUpdate that reject non-foldable values and
values not in the SET members.
- TestGetTypeFromAstRejectsEmptySet triggers the NormalizeSetValues
"empty" error path in getTypeFromAst.
- divFn float32 overflow case added to
TestFloatArithmeticOverflowThroughFunctions (was missing).
- TestArrayArithmeticInfScanRejected drives the per-element IsInf
scan in plusFnArray/minusFnArray/multiFnArray/divFnArray.
No production code changes.
float64 cannot represent math.MaxInt64 (2^63 - 1) or math.MaxUint64 (2^64 - 1) exactly — they round up to 2^63 and 2^64 respectively. The earlier bounds check used `rounded > float64(MaxInt64)` which treated the un-representable 2^63 as in-range; T2(float64) of that value is implementation-defined and Go's int64 conversion produces MinInt64. The same shape of hole existed for float -> uint64 and for the float branch in numericToBit. Replace the upper-bound test on int64 / uint64 / bit(n)-via-uint64 with a strict "rounded >= first-unrepresentable-float" check (math.Pow(2, 63) / math.Pow(2, 64)). Narrower integer targets (int8/16/32, uint8/16/32) keep the plain > compare since their maxima are exactly representable. Regression test exercises the 2^63 / 2^64 boundary and the nearest exactly-representable float below each (2^63 - 1024 / 2^64 - 2048), asserting that the boundary itself is rejected while the representable predecessor still passes.
…allback constructFuzzyFilter initialised displayPkName to the target's Pkey.PkeyColName and only overrode it when n.Fuzzymessage was populated. The CREATE UNIQUE INDEX on an existing table path constructs a fuzzy filter directly against the hidden index table with Fuzzymessage == nil, so pkName stays as __mo_index_idx_col (single column) or __mo_cpkey_col (composite) and gets surfaced by fuzzyfilter.test / testAndAdd into NewDuplicateEntry. This mirrors the newFuzzyCheck hidden-only fix — when the target table is hidden and no Fuzzymessage is attached, label the key as "unique index" instead of the internal catalog column.
formatStrLit only doubled single quotes, and the ENUM/SET member
formatter EscapeFormat did not include '\' in its replace map. The MySQL
scanner treats \<x> as an escape sequence — \b becomes 0x08, \n becomes
newline, \t becomes tab, and a trailing \ collapses into the next byte.
Emitting a raw backslash back into SHOW CREATE output therefore broke
round-trip: a SET('a\b') column serialised as SET('a\b') re-parsed as
SET('a' + 0x08 + 'b'), silently changing the declared member, and the
same held for EXTERNAL TABLE option values routed through formatStrLit.
Escape backslashes first, then single quotes, in formatStrLit (order
matters — the reverse would re-escape the \\ inserted by the quote
step). Add '\\' -> "\\\\" to replaceMap so EscapeFormat covers ENUM/SET
members the same way.
Column/table COMMENT still uses plain '...' concatenation because its
value is stored in pre-escaped form by util.DealCommentString; reworking
that pre-escape contract is out of scope here.
1. Binary->narrow int cast: add bitSize range check after parsing to prevent silent truncation/wrap-around for int8/16/32 and uint8/16/32. 2. decimal128->decimal128 fast path: SetType on result vector instead of mutating the source vector's type. 3. array ScalarOp: reject NaN results (0*Inf) alongside existing Inf check. 4. SHOW CREATE comment: escape backslashes so round-trip through the MySQL scanner preserves literal backslash characters. 5. ON UPDATE OriginString: use WithSingleQuoteString() (matching Default) so string literals in expressions retain their quotes. 6. ENUM metadata: add EncodeEnumValues/DecodeEnumValues with JSON fallback for members containing commas, preventing lossy split on read-back. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. arrayToArray: validate source dimension against target Width before appending, rejecting mismatched array casts instead of silently producing values inconsistent with the declared column type. 2. strToStr: remove the destLen>=255 skip for TEXT sources — explicit CAST(text AS CHAR/VARCHAR(N)) must always enforce the length limit. 3. SHOW CREATE EXTERNAL TABLE: remove pre-escaping of backslash in the escape() helper for ENCLOSED BY / ESCAPED BY — formatStrLit already handles backslash escaping, so pre-escaping caused double-escape. 4. formatStrLit: handle \n, \r, \0 control characters so that CRLF line terminators round-trip distinctly from LF, and NUL bytes serialize correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. arrayToArray: skip dimension check when target Width equals MaxArrayDimension (65535), which is the default for unsized array types — prevents false "dimension mismatch" errors on casts like VECF32(3) -> VECF64. 2. isSameColumnType: fix unconditional `return true` that ignored Width/Scale differences, causing DEFAULT/ON UPDATE expressions to skip necessary casts between same-OID types with different modifiers (e.g. DECIMAL(10,2) vs DECIMAL(18,4)). 3. Fix CI: update Test_strToStr_TextToCharVarchar expectations to match the corrected behavior (TEXT->CHAR/VARCHAR always enforces length). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SHOW CREATE TABLE now correctly escapes backslashes in comments so that round-tripping through the parser preserves the original value. Update the expected result for `inspection_exceptions` COMMENT which contains literal backslashes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What type of PR is this?
Which issue(s) this PR fixes:
issue #24184
What this PR does / why we need it:
This PR reimplements the MySQL type compatibility updates needed by
3.0-devfor issue #24184 without cherry-picking the previous backport commits.It adds or wires:
e+NN, planner literal typing, and string / NULL cast execution paths.exp()and float addition.Validation:
makemake static-checkgo test ./pkg/container/types -count=1go test ./pkg/sql/plan -count=1go test ./pkg/sql/plan/function -count=1go test ./pkg/sql/parsers/dialect/mysql -count=1go test ./pkg/compare ./pkg/sort -count=1go test ./pkg/frontend -run 'Test.*Year|Test.*Geometry|Test.*Set|TestMysqlProtocol|Test_format|Test_getDataFromPipeline' -count=1go test ./pkg/vm/engine/tae/index -run 'Test.*ZM|Test.*Zone' -count=1