You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently duckdb-excel uses cell formats for type inferencing. But the cell styles recognized seem to have gaps.
For example, the extension cannot infer the column type to be an INTEGER, despite the cell format indicating 0 decimal places. This is supported when using the spatial extension.
When reading XLSX files, almost everything is read as either DOUBLE or VARCHAR depending on the Excel cell type. However, there are some caveats.
We try to infer TIMESTAMP, TIME, DATE and BOOLEAN types when possible based on the cell format.
Example excel file with a numeric column formatted with 0 Decimal places - test.xlsx
Additional Information
There are also other issues with the cell style based inference, such as with "Date" columns as mentioned in #13 (comment).
The text was updated successfully, but these errors were encountered:
suhas-dc
changed the title
Support inferring column as INTEGER in read_xlsx
Support inferring a wider range of column types including INTEGER
Feb 10, 2025
suhas-dc
changed the title
Support inferring a wider range of column types including INTEGER
Support inferring column as INTEGER in read_xlsx
Feb 10, 2025
Hello! thanks for opening this issue!
This sounds like a good idea, the only problem I could foresee with this is that Excel stores numbers as doubles under the hood, so in theory you could have integer-looking values that far surpass the range of the INTEGERBIGINT types - so the inference would have to take the cell values into account as well before narrowing a DOUBLE column to a INTEGER/BIGINT.
suhas-dc
changed the title
Support inferring column as INTEGER in read_xlsx
Support inferring a wider range of column types including INTEGER
Feb 10, 2025
suhas-dc
changed the title
Support inferring a wider range of column types including INTEGER
Support inferring a wider range of column types including INTEGER similar to spatial st_read
Feb 10, 2025
I noticed that if the first row of a Date column is an empty cell with "Date" formatting, duckdb-excel fails to infer the column as Date. But spatial seems to infer Date correctly.
Since you mentioned that parity with spatial's non-sample based inference is a target, I reframed the issue around that.
Currently duckdb-excel uses cell formats for type inferencing. But the cell styles recognized seem to have gaps.
For example, the extension cannot infer the column type to be an INTEGER, despite the cell format indicating 0 decimal places. This is supported when using the spatial extension.
Example excel file with a numeric column formatted with 0 Decimal places - test.xlsx
Additional Information
There are also other issues with the cell style based inference, such as with "Date" columns as mentioned in #13 (comment).
The text was updated successfully, but these errors were encountered: