-
Notifications
You must be signed in to change notification settings - Fork 425
feat: add optional parameters for copy table from command #7115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @Standing-Man, Thank you for your contribution! Could you add some SQLness tests or integration tests under the tests-integration directory? You can find an example in |
|
Hi @WenyXu, I’m a bit confused — should So far, I see how a stream connects to a CSV source, but I’m unsure where the CSV reading and processing logic happens. Could you point me in the right direction? |
Yes
It depends on the CSV reader’s capabilities. If users attempt to copy data from multiple files and one of them has a schema that differs from the target table, we should consider skipping data from that file. It would be best to review the CSV reader’s source code. |
|
Hi @WenyXu, no matter whether
Also, the CSV reader reads rows from the source in batches, with default batch size of
I don’t think it’s necessary to add these two optional parameters to enhance the |
Yes, this is the expected behavior. The underlying CSV reader supports reading files without a header line — in such cases, the fields are named field_0, field_1, and so on. Therefore, we assume that the column order in the input CSV matches that of the target table, and align the CSV schema with the table schema before performing the compatibility check. |
a10f0df to
a858a14
Compare
f81849c to
aa7812a
Compare
|
@killme2008 and @WenyXu PTAL when you have time. |
7866972 to
d9d7217
Compare
9ec7db0 to
c5efb9b
Compare
Signed-off-by: Alan Tang <[email protected]>
…lated test cases Signed-off-by: Alan Tang <[email protected]>
Signed-off-by: Alan Tang <[email protected]>
Signed-off-by: Alan Tang <[email protected]>
Signed-off-by: Alan Tang <[email protected]>
Signed-off-by: Alan Tang <[email protected]>
Signed-off-by: StandingMan <[email protected]>
Signed-off-by: StandingMan <[email protected]>
Signed-off-by: StandingMan <[email protected]>
Signed-off-by: StandingMan <[email protected]>
Signed-off-by: StandingMan <[email protected]>
c5efb9b to
e38c534
Compare
Signed-off-by: StandingMan <[email protected]>
| fn ensure_schema_compatible(from: &SchemaRef, to: &SchemaRef) -> Result<()> { | ||
| if from.fields().len() != to.fields().len() { | ||
| return error::InvalidHeaderSnafu { | ||
| table_schema: to.to_string(), | ||
| file_schema: from.to_string(), | ||
| } | ||
| .fail(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can’t simply determine schema incompatibility based on field length. If the source schema is a subset of the target schema, they should be considered compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement a separate header checker method to verify whether the schema lengths of the file and the table match, and whether their names and types are consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I suggest adding more tests to cover this feature, as there are many cases we need to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to add some test cases in tests-integration to cover the scenarios when header=false, e.g.,:
- The order of CSV fields differs from the table’s.
- The number of CSV fields is fewer or greater than the table’s.
BTW, when testing with header=false, please make sure to use a CSV file without a header row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions. If a CSV file doesn’t contain a header, can I skip checking the header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a CSV file doesn’t contain a header, can I skip checking the header?
I think we should assume that the data fields in the CSV are in the same order as the table columns.
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
headers(default: false): Enforce validation to ensure that the imported CSV file matches the table’s schema, including column names and types.continue_on_error(default:true): If a mismatch is detected between the imported csv file and the table schema, andcontinue_on_erroris set to true, the file will be skipped and the next CSV file will be processed. Otherwise, the process will stop.PR Checklist
Please convert it to a draft if some of the following conditions are not met.