-
Notifications
You must be signed in to change notification settings - Fork 263
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into data-import
- Loading branch information
Showing
81 changed files
with
11,951 additions
and
168 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
{ | ||
"title": "Bitmap", | ||
"language": "en" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
The BITMAP type can be used in Duplicate tables, Unique tables, and Aggregate tables, and can only be used as a Key type, not as a Value column. When using the BITMAP type in an Aggregate table, the table must be created with the aggregate type BITMAP_UNION. Users do not need to specify length and default values. The length is controlled by the system based on the degree of data aggregation. For more documentation, refer to [Bitmap](../../../sql-manual/sql-data-types/aggregate/BITMAP.md). | ||
|
||
## Usage Example | ||
|
||
### Step 1: Prepare Data | ||
|
||
Create the following CSV file: test_bitmap.csv | ||
|
||
```sql | ||
koga|17723 | ||
nijg|146285 | ||
lojn|347890 | ||
lofn|489871 | ||
jfin|545679 | ||
kon|676724 | ||
nhga|767689 | ||
nfubg|879878 | ||
huang|969798 | ||
buag|97997 | ||
``` | ||
|
||
### Step 2: Create Table in Database | ||
|
||
```sql | ||
CREATE TABLE testdb.test_bitmap( | ||
typ_id BIGINT NULL COMMENT "ID", | ||
hou VARCHAR(10) NULL COMMENT "one", | ||
arr BITMAP BITMAP_UNION NOT NULL COMMENT "two" | ||
) | ||
AGGREGATE KEY(typ_id,hou) | ||
DISTRIBUTED BY HASH(typ_id,hou) BUCKETS 10; | ||
``` | ||
|
||
### Step 3: Load Data | ||
|
||
```sql | ||
curl --location-trusted -u <doris_user>:<doris_password> \ | ||
-H "column_separator:|" \ | ||
-H "columns:typ_id,hou,arr,arr=to_bitmap(arr)" \ | ||
-T test_bitmap.csv \ | ||
-XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_bitmap/_stream_load | ||
``` | ||
|
||
### Step 4: Check Loaded Data | ||
|
||
```sql | ||
mysql> select typ_id,hou,bitmap_to_string(arr) from testdb.test_bitmap; | ||
+--------+-------+-----------------------+ | ||
| typ_id | hou | bitmap_to_string(arr) | | ||
+--------+-------+-----------------------+ | ||
| 4 | lofn | 489871 | | ||
| 6 | kon | 676724 | | ||
| 9 | huang | 969798 | | ||
| 3 | lojn | 347890 | | ||
| 8 | nfubg | 879878 | | ||
| 7 | nhga | 767689 | | ||
| 1 | koga | 17723 | | ||
| 2 | nijg | 146285 | | ||
| 5 | jfin | 545679 | | ||
| 10 | buag | 97997 | | ||
+--------+-------+-----------------------+ | ||
10 rows in set (0.07 sec) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
{ | ||
"title": "HLL", | ||
"language": "en" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
HLL is used for approximate deduplication and performs better than Count Distinct when dealing with large data volumes. The load of HLL needs to be combined with functions like hll_hash. For more documentation, refer to [HLL](../../../sql-manual/sql-data-types/aggregate/HLL.md). | ||
|
||
## Usage Example | ||
|
||
### Step 1: Prepare Data | ||
|
||
Create the following CSV file: test_hll.csv | ||
|
||
```sql | ||
koga | ||
nijg | ||
lojn | ||
lofn | ||
jfin | ||
kon | ||
nhga | ||
nfubg | ||
huang | ||
buag | ||
``` | ||
|
||
### Step 2: Create Table in Database | ||
|
||
```sql | ||
CREATE TABLE testdb.test_hll( | ||
typ_id BIGINT NULL COMMENT "ID", | ||
typ_name VARCHAR(10) NULL COMMENT "NAME", | ||
pv hll hll_union NOT NULL COMMENT "hll" | ||
) | ||
AGGREGATE KEY(typ_id,typ_name) | ||
DISTRIBUTED BY HASH(typ_id) BUCKETS 10; | ||
``` | ||
|
||
### Step 3: Load Data | ||
|
||
```sql | ||
curl --location-trusted -u <doris_user>:<doris_password> \ | ||
-H "column_separator:|" \ | ||
-H "columns:typ_id,typ_name,pv=hll_hash(typ_id)" \ | ||
-T test_hll.csv \ | ||
-XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_hll/_stream_load | ||
``` | ||
|
||
### Step 4: Check Loaded Data | ||
|
||
Use hll_cardinality to query: | ||
|
||
```sql | ||
mysql> select typ_id,typ_name,hll_cardinality(pv) from testdb.test_hll; | ||
+--------+----------+---------------------+ | ||
| typ_id | typ_name | hll_cardinality(pv) | | ||
+--------+----------+---------------------+ | ||
| 1010 | buag | 1 | | ||
| 1002 | nijg | 1 | | ||
| 1001 | koga | 1 | | ||
| 1008 | nfubg | 1 | | ||
| 1005 | jfin | 1 | | ||
| 1009 | huang | 1 | | ||
| 1004 | lofn | 1 | | ||
| 1007 | nhga | 1 | | ||
| 1003 | lojn | 1 | | ||
| 1006 | kon | 1 | | ||
+--------+----------+---------------------+ | ||
10 rows in set (0.06 sec) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
{ | ||
"title": "Variant", | ||
"language": "en" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
The VARIANT type can store semi-structured JSON data, allowing for the storage of complex data structures that contain different data types (such as integers, strings, booleans, etc.) without the need to predefine specific columns in the table structure. This type is particularly suitable for handling complex nested structures that may change at any time. During the writing process, the VARIANT type can automatically infer the structure and type of the columns, dynamically merging the written schema, and storing the JSON keys and their corresponding values as columns and dynamic sub-columns. For more documentation, please refer to [VARIANT](../../../sql-manual/sql-data-types/semi-structured/VARIANT.md). | ||
|
||
## Usage Limitations | ||
|
||
Supports CSV and JSON formats. | ||
|
||
## Loading CSV Format | ||
|
||
### Step 1: Prepare Data | ||
|
||
Create a CSV file named `test_variant.csv` with the following content: | ||
|
||
```SQL | ||
14186154924|PushEvent|{"avatar_url":"https://avatars.githubusercontent.com/u/282080?","display_login":"brianchandotcom","gravatar_id":"","id":282080,"login":"brianchandotcom","url":"https://api.github.com/users/brianchandotcom"}|{"id":1920851,"name":"brianchandotcom/liferay-portal","url":"https://api.github.com/repos/brianchandotcom/liferay-portal"}|{"before":"abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits":[""],"distinct_size":4,"head":"91edd3c8c98c214155191feb852831ec535580ba","push_id":6027092734,"ref":"refs/heads/master","size":4}|1|2020-11-14 02:00:00 | ||
``` | ||
|
||
### Step 2: Create Table in Database | ||
|
||
Execute the following SQL statement to create the table: | ||
|
||
```SQL | ||
CREATE TABLE IF NOT EXISTS testdb.test_variant ( | ||
id BIGINT NOT NULL, | ||
type VARCHAR(30) NULL, | ||
actor VARIANT NULL, | ||
repo VARIANT NULL, | ||
payload VARIANT NULL, | ||
public BOOLEAN NULL, | ||
created_at DATETIME NULL, | ||
INDEX idx_payload (`payload`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for payload' | ||
) | ||
DUPLICATE KEY(`id`) | ||
DISTRIBUTED BY HASH(id) BUCKETS 10 | ||
properties("replication_num" = "1"); | ||
``` | ||
|
||
### Step 3: Load Data | ||
|
||
Using stream load as an example, use the following command to load data: | ||
|
||
```SQL | ||
curl --location-trusted -u root: -T test_variant.csv -H "column_separator:|" http://127.0.0.1:8030/api/testdb/test_variant/_stream_load | ||
``` | ||
|
||
Example of load results: | ||
|
||
```SQL | ||
{ | ||
"TxnId": 12, | ||
"Label": "96cd6250-9c78-4a9f-b8b3-2b7cef0dd606", | ||
"Comment": "", | ||
"TwoPhaseCommit": "false", | ||
"Status": "Success", | ||
"Message": "OK", | ||
"NumberTotalRows": 1, | ||
"NumberLoadedRows": 1, | ||
"NumberFilteredRows": 0, | ||
"NumberUnselectedRows": 0, | ||
"LoadBytes": 660, | ||
"LoadTimeMs": 213, | ||
"BeginTxnTimeMs": 0, | ||
"StreamLoadPutTimeMs": 6, | ||
"ReadDataTimeMs": 0, | ||
"WriteDataTimeMs": 183, | ||
"ReceiveDataTimeMs": 14, | ||
"CommitAndPublishTimeMs": 20 | ||
} | ||
``` | ||
|
||
### Step 4: Check Loaded Data | ||
|
||
Use the following SQL query to check the loaded data: | ||
|
||
```SQL | ||
mysql> select * from testdb.test_variant\G | ||
*************************** 1. row *************************** | ||
id: 14186154924 | ||
type: PushEvent | ||
actor: {"avatar_url":"https://avatars.githubusercontent.com/u/282080?","display_login":"brianchandotcom","gravatar_id":"","id":282080,"login":"brianchandotcom","url":"https://api.github.com/users/brianchandotcom"} | ||
repo: {"id":1920851,"name":"brianchandotcom/liferay-portal","url":"https://api.github.com/repos/brianchandotcom/liferay-portal"} | ||
payload: {"before":"abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits":[""],"distinct_size":4,"head":"91edd3c8c98c214155191feb852831ec535580ba","push_id":6027092734,"ref":"refs/heads/master","size":4} | ||
public: 1 | ||
created_at: 2020-11-14 02:00:00 | ||
``` | ||
|
||
## Loading JSON Format | ||
|
||
### Step 1: Prepare Data | ||
|
||
Create a JSON file named `test_variant.json` with the following content: | ||
|
||
```SQL | ||
{"id": "14186154924","type": "PushEvent","actor": {"id": 282080,"login":"brianchandotcom","display_login": "brianchandotcom","gravatar_id": "","url": "https://api.github.com/users/brianchandotcom","avatar_url": "https://avatars.githubusercontent.com/u/282080?"},"repo": {"id": 1920851,"name": "brianchandotcom/liferay-portal","url": "https://api.github.com/repos/brianchandotcom/liferay-portal"},"payload": {"push_id": 6027092734,"size": 4,"distinct_size": 4,"ref": "refs/heads/master","head": "91edd3c8c98c214155191feb852831ec535580ba","before": "abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits": [""]},"public": true,"created_at": "2020-11-13T18:00:00Z"} | ||
``` | ||
|
||
### Step 2: Create Table in Database | ||
|
||
Execute the following SQL statement to create the table: | ||
|
||
```SQL | ||
CREATE TABLE IF NOT EXISTS testdb.test_variant ( | ||
id BIGINT NOT NULL, | ||
type VARCHAR(30) NULL, | ||
actor VARIANT NULL, | ||
repo VARIANT NULL, | ||
payload VARIANT NULL, | ||
public BOOLEAN NULL, | ||
created_at DATETIME NULL, | ||
INDEX idx_payload (`payload`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for payload' | ||
) | ||
DUPLICATE KEY(`id`) | ||
DISTRIBUTED BY HASH(id) BUCKETS 10; | ||
``` | ||
|
||
### Step 3: Load Data | ||
|
||
Using stream load as an example, use the following command to load data: | ||
|
||
```SQL | ||
curl --location-trusted -u root: -T test_variant.json -H "format:json" http://127.0.0.1:8030/api/testdb/test_variant/_stream_load | ||
``` | ||
|
||
Example of load results: | ||
|
||
```SQL | ||
{ | ||
"TxnId": 12, | ||
"Label": "96cd6250-9c78-4a9f-b8b3-2b7cef0dd606", | ||
"Comment": "", | ||
"TwoPhaseCommit": "false", | ||
"Status": "Success", | ||
"Message": "OK", | ||
"NumberTotalRows": 1, | ||
"NumberLoadedRows": 1, | ||
"NumberFilteredRows": 0, | ||
"NumberUnselectedRows": 0, | ||
"LoadBytes": 660, | ||
"LoadTimeMs": 213, | ||
"BeginTxnTimeMs": 0, | ||
"StreamLoadPutTimeMs": 6, | ||
"ReadDataTimeMs": 0, | ||
"WriteDataTimeMs": 183, | ||
"ReceiveDataTimeMs": 14, | ||
"CommitAndPublishTimeMs": 20 | ||
} | ||
``` | ||
|
||
### Step 4: Check Loaded Data | ||
|
||
Use the following SQL query to check the loaded data: | ||
|
||
```SQL | ||
mysql> select * from testdb.test_variant\G | ||
*************************** 1. row *************************** | ||
id: 14186154924 | ||
type: PushEvent | ||
actor: {"avatar_url":"https://avatars.githubusercontent.com/u/282080?","display_login":"brianchandotcom","gravatar_id":"","id":282080,"login":"brianchandotcom","url":"https://api.github.com/users/brianchandotcom"} | ||
repo: {"id":1920851,"name":"brianchandotcom/liferay-portal","url":"https://api.github.com/repos/brianchandotcom/liferay-portal"} | ||
payload: {"before":"abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits":[""],"distinct_size":4,"head":"91edd3c8c98c214155191feb852831ec535580ba","push_id":6027092734,"ref":"refs/heads/master","size":4} | ||
public: 1 | ||
created_at: 2020-11-14 02:00:00 | ||
``` |
Oops, something went wrong.