Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/en/connector-v2/source/CosFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar and
| common-options | | no | - |
| file_filter_modified_start | string | no | - |
| file_filter_modified_end | string | no | - |
| quote_char | string | no | " |
| escape_char | string | no | - |

### path [string]

Expand Down Expand Up @@ -417,6 +419,14 @@ File modification time filter. The connector will filter some files base on the

File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`.

### quote_char [string]

A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly.

### escape_char [string]

A single character that allows the quote or other special characters to appear inside a CSV field without ending the field.

### common options

Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.
Expand Down
10 changes: 10 additions & 0 deletions docs/en/connector-v2/source/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| common-options | | no | - |
| file_filter_modified_start | string | no | - |
| file_filter_modified_end | string | no | - |
| quote_char | string | no | " |
| escape_char | string | no | - |

### host [string]

Expand Down Expand Up @@ -440,6 +442,14 @@ File modification time filter. The connector will filter some files base on the

File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`.

### quote_char [string]

A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly.

### escape_char [string]

A single character that allows the quote or other special characters to appear inside a CSV field without ending the field.

### common options

Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.
Expand Down
16 changes: 13 additions & 3 deletions docs/en/connector-v2/source/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,16 @@ Read data from hdfs file system.
| file_filter_pattern | string | no | | Filter pattern, which used for filtering files. |
| filename_extension | string | no | - | Filter filename extension, which used for filtering files with specific extension. Example: `csv` `.txt` `json` `.xml`. |
| compress_codec | string | no | none | The compress codec of files |
| archive_compress_codec | string | no | none |
| archive_compress_codec | string | no | none | |
| encoding | string | no | UTF-8 | |
| null_format | string | no | - | Only used when file_format_type is text. null_format to define which strings can be represented as null. e.g: `\N` |
| binary_chunk_size | int | no | 1024 | Only used when file_format_type is binary. The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory. |
| binary_complete_file_mode | boolean | no | false | Only used when file_format_type is binary. Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false. |
| common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |
| file_filter_modified_start | string | no | - | File modification time filter. The connector will filter some files base on the last modification start time (include start time). The default data format is `yyyy-MM-dd HH:mm:ss`. |
| file_filter_modified_end | string | no | - | File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`. |
| quote_char | string | no | " | A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly. |
| escape_char | string | no | - | A single character that allows the quote or other special characters to appear inside a CSV field without ending the field. |

### file_format_type [string]

Expand Down Expand Up @@ -183,8 +185,8 @@ The compress codec of files and the details that supported as the following show

The compress codec of archive files and the details that supported as the following shown:

| archive_compress_codec | file_format | archive_compress_suffix |
|------------------------|-------------------|-------------------------|
| archive_compress_codec | file_format | archive_compress_suffix |
|------------------------|--------------------|-------------------------|
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
Expand All @@ -210,6 +212,14 @@ Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### quote_char [string]

A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly.

### escape_char [string]

A single character that allows the quote or other special characters to appear inside a CSV field without ending the field.

### Tips

> If you use spark/flink, In order to use this connector, You must ensure your spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x. If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you download and install SeaTunnel Engine. You can check the jar package under ${SEATUNNEL_HOME}/lib to confirm this.
Expand Down
11 changes: 10 additions & 1 deletion docs/en/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| tables_configs | list | no | used to define a multiple table task |
| file_filter_modified_start | string | no | - |
| file_filter_modified_end | string | no | - |

| quote_char | string | no | " |
| escape_char | string | no | - |
### path [string]

The source file path.
Expand Down Expand Up @@ -415,6 +416,14 @@ File modification time filter. The connector will filter some files base on the

File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`.

### quote_char [string]

A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly.

### escape_char [string]

A single character that allows the quote or other special characters to appear inside a CSV field without ending the field.

### common options

Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details
Expand Down
2 changes: 2 additions & 0 deletions docs/en/connector-v2/source/ObsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ It only supports hadoop version **2.9.X+**.
| sheet_name | string | no | - | Reader the sheet of the workbook,Only used when file_format is excel. |
| file_filter_modified_start | string | no | - | File modification time filter. The connector will filter some files base on the last modification start time (include start time). The default data format is `yyyy-MM-dd HH:mm:ss`. |
| file_filter_modified_end | string | no | - | File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`. |
| quote_char | string | no | " | A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly. |
| escape_char | string | no | - | A single character that allows the quote or other special characters to appear inside a CSV field without ending the field. |

### Tips

Expand Down
10 changes: 10 additions & 0 deletions docs/en/connector-v2/source/OssFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,8 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
| common-options | config | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |
| file_filter_modified_start | string | no | - | File modification time filter. The connector will filter some files base on the last modification start time (include start time). The default data format is `yyyy-MM-dd HH:mm:ss`. |
| file_filter_modified_end | string | no | - | File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`. |
| quote_char | string | no | " | A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly. |
| escape_char | string | no | - | A single character that allows the quote or other special characters to appear inside a CSV field without ending the field. |

### file_format_type [string]

Expand Down Expand Up @@ -263,6 +265,14 @@ Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### quote_char [string]

A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly.

### escape_char [string]

A single character that allows the quote or other special characters to appear inside a CSV field without ending the field.

### file_filter_pattern [string]

Filter pattern, which used for filtering files. If you only want to filter based on file names, simply write the regular file names; If you want to filter based on the file directory at the same time, the expression needs to start with `path`.
Expand Down
10 changes: 10 additions & 0 deletions docs/en/connector-v2/source/OssJindoFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ It only supports hadoop version **2.9.X+**.
| common-options | | no | - |
| file_filter_modified_start | string | no | - |
| file_filter_modified_end | string | no | - |
| quote_char | string | no | " |
| escape_char | string | no | - |

### path [string]

Expand Down Expand Up @@ -398,6 +400,14 @@ File modification time filter. The connector will filter some files base on the

File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`.

### quote_char [string]

A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly.

### escape_char [string]

A single character that allows the quote or other special characters to appear inside a CSV field without ending the field.

### common options

Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.
Expand Down
10 changes: 10 additions & 0 deletions docs/en/connector-v2/source/S3File.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,8 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
| file_filter_pattern | string | no | | Filter pattern, which used for filtering files. |
| filename_extension | string | no | - | Filter filename extension, which used for filtering files with specific extension. Example: `csv` `.txt` `json` `.xml`. |
| common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |
| quote_char | string | no | " | A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly. |
| escape_char | string | no | - | A single character that allows the quote or other special characters to appear inside a CSV field without ending the field. |

### file_format_type [string]

Expand Down Expand Up @@ -349,6 +351,14 @@ Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### quote_char [string]

A single character that encloses CSV fields, allowing fields with commas, line breaks, or quotes to be read correctly.

### escape_char [string]

A single character that allows the quote or other special characters to appear inside a CSV field without ending the field.

## Example

1. In this example, We read data from s3 path `s3a://seatunnel-test/seatunnel/text` and the file type is orc in this path.
Expand Down
Loading