-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Feature][Engine]Metalake support for data source information storage and management #9688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 37 commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
ca2d091
feat:Add metalake configuration option in seatunnel-env.sh and seatun…
wtybxqm 8ceffc9
Feat:Fetch metalake config from environment and merge into task's env
wtybxqm b5b3cc0
feat:Fetch metalake config by sourceId and replace placeholders when …
wtybxqm d4ec880
feat:Add MetalakeClient interface & GravitinoClient implementation; I…
wtybxqm d2a84c7
feat:apply spotless code style
wtybxqm f32ac80
feat: replace wildcard imports with explicit imports for okhttp3
wtybxqm d545047
fix: delete useless EnvOption
wtybxqm 84491da
feat: Integration Test for Metalake
wtybxqm 768815b
feat: Integration Test
wtybxqm a6d2327
feat: apply spotless code style
wtybxqm a1148f7
fix: fix the error of config structure
wtybxqm c0b304e
feat: place okhttp3 with apache httpclient
wtybxqm d66778e
fix: fix the bug of matainfo replace and remove log info
wtybxqm b741052
feat: apply spotless code style and remove extra code
wtybxqm 50d0158
feat: Add docs of Metalake in zh and en
wtybxqm bd027c7
Merge branch 'dev' into support_metalake_development
wtybxqm 84a2b0b
push an empty commit to trigger the workflow
wtybxqm 1acfb14
fix: add license header to conf file
wtybxqm cfeb84a
feat: download gravitino in test container
wtybxqm ec25bb0
feat: apply spotless codestyle and remove useless code
wtybxqm 1b42050
feat: support metalake for spark and flink engine; use assert connect…
wtybxqm d67545f
fix: fix the error of the assert connector
wtybxqm 4013ed8
fix: move metalakeIT to seatunnel-connector-v2-e2e
wtybxqm d0b1ff2
fix: apply spotless code style
wtybxqm 618fa28
fix: remove metalakeIT in seatunnel-engine-e2e
wtybxqm dc45daa
feat: add metalake integration test for spark and flink engine
wtybxqm 73a4d00
fix: download gravitino in flink container
wtybxqm b528a98
fix: move the docs to concept directory; remove extra test cases for …
wtybxqm b26243a
Merge branch 'dev' into support_metalake_development
wtybxqm 4f3f006
fix: add httpcore dependency in known-dependencies.txt
wtybxqm bba310a
fix: reuse PlaceholderUtils and refactor the getMetalakeConfig method…
wtybxqm 72a7538
fix: add license header
wtybxqm 46d19f7
fix: add capture group in pattern matcher
wtybxqm 8c96119
fix: refactor MetalakeConfigUtils and use PlaceholderUtils and JsonUtils
wtybxqm 4df31f8
fix: unify the version of httpcore
wtybxqm 0f3363a
fix: remove extra version of httpcore in kwown-dependencies.txt
wtybxqm dffb3b6
feat: get variables from env before from system
wtybxqm 5bb154f
feat: refactor getMetalakeConfig method and support transform
wtybxqm dde368c
fix: modify the arg name of replacePlaceholders method and refactor g…
wtybxqm 2362402
fix: create metalakeClient only once in getMetalakeConfig method
wtybxqm 788bfda
fix: add metalake in sidebar.js of docs
wtybxqm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# METALAKE | ||
|
||
Since Seatunnel requires database usernames, passwords, and other sensitive information to be written in plaintext within scripts when executing tasks, this may lead to information leakage and is also difficult to maintain. When data source information changes, manual modifications are often required. | ||
|
||
To address this, Metalake is introduced. Data source information can be stored in Metalake systems such as Apache Gravitino. Task scripts then use `sourceId` and placeholders instead of actual usernames and passwords. At runtime, the Seatunnel engine retrieves the information from Metalake via HTTP requests and replaces the placeholders accordingly. | ||
|
||
To enable Metalake, you first need to modify the environment variables in **seatunnel-env.sh**: | ||
|
||
* `METALAKE_ENABLED` | ||
* `METALAKE_TYPE` | ||
* `METALAKE_URL` | ||
|
||
Set `METALAKE_ENABLED` to `true`. Currently, `METALAKE_TYPE` only supports `gravitino`. | ||
|
||
For Apache Gravitino, set `METALAKE_URL` to: | ||
|
||
``` | ||
http://host:port/api/metalakes/your_metalake_name/catalogs/ | ||
``` | ||
|
||
--- | ||
|
||
## Usage Example | ||
|
||
First, create a catalog in Gravitino, for example: | ||
|
||
```bash | ||
curl -L 'http://localhost:8090/api/metalakes/test_metalake/catalogs' \ | ||
-H 'Content-Type: application/json' \ | ||
-H 'Accept: application/vnd.gravitino.v1+json' \ | ||
-d '{ | ||
"name": "test_catalog", | ||
"type": "relational", | ||
"provider": "jdbc-mysql", | ||
"comment": "for metalake test", | ||
"properties": { | ||
"jdbc-driver": "com.mysql.cj.jdbc.Driver", | ||
"jdbc-url": "not used", | ||
"jdbc-user": "root", | ||
"jdbc-password": "Abc!@#135_seatunnel" | ||
} | ||
}' | ||
``` | ||
|
||
This creates a `test_catalog` under `test_metalake` (note: `metalake` itself must be created in advance). | ||
|
||
Thus, `METALAKE_URL` can be set to: | ||
|
||
``` | ||
http://localhost:8090/api/metalakes/test_metalake/catalogs/ | ||
``` | ||
|
||
You can then define the source as: | ||
|
||
```hocon | ||
source { | ||
Jdbc { | ||
url = "jdbc:mysql://mysql-e2e:3306/seatunnel?useSSL=false&serverTimezone=UTC&allowPublicKeyRetrieval=true" | ||
driver = "${jdbc-driver}" | ||
connection_check_timeout_sec = 100 | ||
sourceId = "test_catalog" | ||
user = "${jdbc-user}" | ||
password = "${jdbc-password}" | ||
query = "select * from source" | ||
} | ||
} | ||
``` | ||
|
||
Here, `sourceId` refers to the catalog name, allowing other fields to use `${}` placeholders. At runtime, they will be automatically replaced. Note that in sinks, the same `sourceId` name is used, and placeholders must always start with `${` and end with `}`. Each item can contain at most one placeholder, and there can be content outside the placeholder as well. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# METALAKE | ||
|
||
由于Seatunnel在执行任务时,需要将数据库用户名与密码等隐私信息明文写在脚本中,可能会导致信息泄露;并且维护较为困难,数据源信息发生变更时可能需要手动更改。 | ||
|
||
因此引入了metalake,将数据源的信息存储于Apache Gravitino等metalake中,任务脚本采用`sourceId`和占位符的方法来代替原本的用户名和密码等信息,运行时seatunnel-engine通过http请求从metalake获取信息,根据占位符进行替换。 | ||
|
||
若要使用metalake,首先要修改**seatunnel-env.sh**中的环境变量: | ||
|
||
* `METALAKE_ENABLED` | ||
* `METALAKE_TYPE` | ||
* `METALAKE_URL` | ||
|
||
将`METALAKE_ENABLED`设为`true`,`METALAKE_TYPE`当前仅支持设为`gravitino`。 | ||
|
||
对于Apache Gravitino,`METALAKE_URL`设为 | ||
|
||
``` | ||
http://host:port/api/metalakes/your_metalake_name/catalogs/ | ||
``` | ||
|
||
--- | ||
|
||
## 使用示例: | ||
|
||
用户需要先在Gravitino中创建catalog,如 | ||
|
||
```bash | ||
curl -L 'http://localhost:8090/api/metalakes/test_metalake/catalogs' | ||
-H 'Content-Type: application/json' | ||
-H 'Accept: application/vnd.gravitino.v1+json' | ||
-d '{ | ||
"name": "test_catalog", | ||
"type": "relational", | ||
"provider": "jdbc-mysql", | ||
"comment": "for metalake test", | ||
"properties": { | ||
"jdbc-driver": "com.mysql.cj.jdbc.Driver", | ||
"jdbc-url": "not used", | ||
"jdbc-user": "root", | ||
"jdbc-password": "Abc!@#135_seatunnel" | ||
} | ||
}' | ||
``` | ||
|
||
这样便在`test_metalake`中创建了一个`test_catalog`(`metalake`需要提前创建) | ||
|
||
于是`METALAKE_URL`可以设为 | ||
|
||
``` | ||
http://localhost:8090/api/metalakes/test_metalake/catalogs/ | ||
``` | ||
|
||
source可以写为 | ||
|
||
``` | ||
source { | ||
Jdbc { | ||
url = "jdbc:mysql://mysql-e2e:3306/seatunnel?useSSL=false&serverTimezone=UTC&allowPublicKeyRetrieval=true" | ||
driver = "${jdbc-driver}" | ||
connection_check_timeout_sec = 100 | ||
sourceId = "test_catalog" | ||
user = "${jdbc-user}" | ||
password = "${jdbc-password}" | ||
query = "select * from source" | ||
} | ||
} | ||
``` | ||
|
||
其中`sourceId`指代catalog的名称,从而其他项可以使用`${}`占位符,运行时会自动替换。注意,在sink中使用时,同样叫`sourceId`;使用占位符时必须以`${`开头,以`}`结尾,每一项最多只能包含一个占位符,占位符以外也可以有内容 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
66 changes: 66 additions & 0 deletions
66
seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/GravitinoClient.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.seatunnel.api.metalake; | ||
|
||
import org.apache.seatunnel.shade.com.fasterxml.jackson.databind.JsonNode; | ||
|
||
import org.apache.seatunnel.common.utils.JsonUtils; | ||
|
||
import org.apache.http.HttpEntity; | ||
import org.apache.http.client.methods.CloseableHttpResponse; | ||
import org.apache.http.client.methods.HttpGet; | ||
import org.apache.http.impl.client.CloseableHttpClient; | ||
import org.apache.http.impl.client.HttpClients; | ||
import org.apache.http.util.EntityUtils; | ||
|
||
import java.io.IOException; | ||
|
||
public class GravitinoClient implements MetalakeClient { | ||
private final String metalakeUrl; | ||
|
||
public GravitinoClient(String metalakeUrl) { | ||
this.metalakeUrl = metalakeUrl; | ||
} | ||
|
||
@Override | ||
public String getType() { | ||
return "gravitino"; | ||
} | ||
|
||
@Override | ||
public JsonNode getMetaInfo(String sourceId) throws IOException { | ||
try (CloseableHttpClient client = HttpClients.createDefault()) { | ||
HttpGet request = new HttpGet(this.metalakeUrl + sourceId); | ||
request.addHeader("Accept", "application/vnd.gravitino.v1+json"); | ||
try (CloseableHttpResponse response = client.execute(request)) { | ||
HttpEntity entity = response.getEntity(); | ||
if (entity == null) { | ||
throw new RuntimeException("No response entity"); | ||
} | ||
JsonNode rootNode = JsonUtils.readTree(entity.getContent()); | ||
EntityUtils.consume(entity); | ||
JsonNode catalogNode = rootNode.get("catalog"); | ||
if (catalogNode == null) { | ||
throw new RuntimeException("Response JSON has no 'catalog' field"); | ||
} | ||
JsonNode propertiesNode = catalogNode.get("properties"); | ||
return propertiesNode; | ||
} | ||
} | ||
} | ||
} |
28 changes: 28 additions & 0 deletions
28
seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/MetalakeClient.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.seatunnel.api.metalake; | ||
|
||
import org.apache.seatunnel.shade.com.fasterxml.jackson.databind.JsonNode; | ||
|
||
import java.io.IOException; | ||
|
||
public interface MetalakeClient { | ||
String getType(); | ||
|
||
JsonNode getMetaInfo(String sourceId) throws IOException; | ||
} |
44 changes: 44 additions & 0 deletions
44
seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/MetalakeClientFactory.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.seatunnel.api.metalake; | ||
|
||
import java.util.HashMap; | ||
import java.util.Map; | ||
import java.util.function.Function; | ||
|
||
public class MetalakeClientFactory { | ||
private static final Map<String, Function<String, MetalakeClient>> REGISTRY = new HashMap<>(); | ||
|
||
static { | ||
register("gravitino", GravitinoClient::new); | ||
} | ||
|
||
private MetalakeClientFactory() {} | ||
|
||
public static void register(String type, Function<String, MetalakeClient> constructor) { | ||
REGISTRY.put(type.toLowerCase(), constructor); | ||
} | ||
|
||
public static MetalakeClient create(String type, String metalakeUrl) { | ||
Function<String, MetalakeClient> constructor = REGISTRY.get(type.toLowerCase()); | ||
if (constructor == null) { | ||
throw new IllegalArgumentException("Unknown MetalakeClient type: " + type); | ||
} | ||
return constructor.apply(metalakeUrl); | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use sourceId not source_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_name is the id of the catalog in Apache Gravitino, but maybe source_name is not used in other metalake type