Skip to content

Commit 2dd251c

Browse files
TomShawnyikekelilin90
authored
tiup: add document of tispark support in tiup cluster (pingcap#3611)
* tiup: add document of tispark support in tiup cluster * Apply suggestions from code review Co-authored-by: Keke Yi <[email protected]> * Update the order of warning and heading * Update yaml file links * Update tispark-deployment-topology.md Co-authored-by: Keke Yi <[email protected]> Co-authored-by: Lilian Lee <[email protected]>
1 parent 976ec8f commit 2dd251c

6 files changed

+248
-0
lines changed

Diff for: TOC.md

+1
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
+ [TiFlash Topology](/tiflash-deployment-topology.md)
3232
+ [TiCDC Topology](/ticdc-deployment-topology.md)
3333
+ [TiDB Binlog Topology](/tidb-binlog-deployment-topology.md)
34+
+ [TiSpark Topology](/tispark-deployment-topology.md)
3435
+ [Cross-DC Topology](/geo-distributed-deployment-topology.md)
3536
+ [Hybrid Topology](/hybrid-deployment-topology.md)
3637
+ Install and Start

Diff for: config-templates/complex-tispark.yaml

+150
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# # Global variables are applied to all deployments and used as the default value of
2+
# # the deployments if a specific deployment value is missing.
3+
global:
4+
user: "tidb"
5+
ssh_port: 22
6+
deploy_dir: "/tidb-deploy"
7+
data_dir: "/tidb-data"
8+
9+
# # Monitored variables are applied to all the machines.
10+
monitored:
11+
node_exporter_port: 9100
12+
blackbox_exporter_port: 9115
13+
# deploy_dir: "/tidb-deploy/monitored-9100"
14+
# data_dir: "/tidb-data/monitored-9100"
15+
# log_dir: "/tidb-deploy/monitored-9100/log"
16+
17+
# # Server configs are used to specify the runtime configuration of TiDB components.
18+
# # All configuration items can be found in TiDB docs:
19+
# # - TiDB: https://pingcap.com/docs/stable/reference/configuration/tidb-server/configuration-file/
20+
# # - TiKV: https://pingcap.com/docs/stable/reference/configuration/tikv-server/configuration-file/
21+
# # - PD: https://pingcap.com/docs/stable/reference/configuration/pd-server/configuration-file/
22+
# # All configuration items use points to represent the hierarchy, e.g:
23+
# # readpool.storage.use-unified-pool
24+
# #
25+
# # You can overwrite this configuration via the instance-level `config` field.
26+
27+
server_configs:
28+
tidb:
29+
log.slow-threshold: 300
30+
tikv:
31+
# server.grpc-concurrency: 4
32+
# raftstore.apply-pool-size: 2
33+
# raftstore.store-pool-size: 2
34+
# rocksdb.max-sub-compactions: 1
35+
# storage.block-cache.capacity: "16GB"
36+
# readpool.unified.max-thread-count: 12
37+
readpool.storage.use-unified-pool: false
38+
readpool.coprocessor.use-unified-pool: true
39+
pd:
40+
schedule.leader-schedule-limit: 4
41+
schedule.region-schedule-limit: 2048
42+
schedule.replica-schedule-limit: 64
43+
44+
pd_servers:
45+
- host: 10.0.1.4
46+
# ssh_port: 22
47+
# name: "pd-1"
48+
# client_port: 2379
49+
# peer_port: 2380
50+
# deploy_dir: "/tidb-deploy/pd-2379"
51+
# data_dir: "/tidb-data/pd-2379"
52+
# log_dir: "/tidb-deploy/pd-2379/log"
53+
# numa_node: "0,1"
54+
# # The following configs are used to overwrite the `server_configs.pd` values.
55+
# config:
56+
# schedule.max-merge-region-size: 20
57+
# schedule.max-merge-region-keys: 200000
58+
- host: 10.0.1.5
59+
- host: 10.0.1.6
60+
61+
tidb_servers:
62+
- host: 10.0.1.1
63+
# ssh_port: 22
64+
# port: 4000
65+
# status_port: 10080
66+
# deploy_dir: "/tidb-deploy/tidb-4000"
67+
# log_dir: "/tidb-deploy/tidb-4000/log"
68+
# numa_node: "0,1"
69+
# # The following configs are used to overwrite the `server_configs.tidb` values.
70+
# config:
71+
# log.slow-query-file: tidb-slow-overwrited.log
72+
- host: 10.0.1.2
73+
- host: 10.0.1.3
74+
75+
tikv_servers:
76+
- host: 10.0.1.7
77+
# ssh_port: 22
78+
# port: 20160
79+
# status_port: 20180
80+
# deploy_dir: "/tidb-deploy/tikv-20160"
81+
# data_dir: "/tidb-data/tikv-20160"
82+
# log_dir: "/tidb-deploy/tikv-20160/log"
83+
# numa_node: "0,1"
84+
# # The following configs are used to overwrite the `server_configs.tikv` values.
85+
# config:
86+
# server.grpc-concurrency: 4
87+
# server.labels: { zone: "zone1", dc: "dc1", host: "host1" }
88+
89+
- host: 10.0.1.8
90+
- host: 10.0.1.9
91+
92+
# NOTE: TiSpark support is an experimental feature, it's not recommend to be used in
93+
# production at present.
94+
# To use TiSpark, you need to manually install Java Runtime Environment (JRE) 8 on the
95+
# host, see the OpenJDK doc for a reference: https://openjdk.java.net/install/
96+
# If you have already installed JRE 1.8 at a location other than the default of system's
97+
# package management system, you may use the "java_home" field to set the JAVA_HOME variable.
98+
# NOTE: Only 1 master node is supported for now
99+
tispark_masters:
100+
- host: 10.0.1.21
101+
# ssh_port: 22
102+
# port: 7077
103+
# web_port: 8080
104+
# deploy_dir: "/tidb-deploy/tispark-master-7077"
105+
# java_home: "/usr/local/bin/java-1.8.0"
106+
# spark_config:
107+
# spark.driver.memory: "2g"
108+
# spark.eventLog.enabled: "False"
109+
# spark.tispark.grpc.framesize: 268435456
110+
# spark.tispark.grpc.timeout_in_sec: 100
111+
# spark.tispark.meta.reload_period_in_sec: 60
112+
# spark.tispark.request.command.priority: "Low"
113+
# spark.tispark.table.scan_concurrency: 256
114+
# spark_env:
115+
# SPARK_EXECUTOR_CORES: 5
116+
# SPARK_EXECUTOR_MEMORY: "10g"
117+
# SPARK_WORKER_CORES: 5
118+
# SPARK_WORKER_MEMORY: "10g"
119+
120+
# NOTE: multiple worker nodes on the same host is not supported by Spark
121+
tispark_workers:
122+
- host: 10.0.1.22
123+
# ssh_port: 22
124+
# port: 7078
125+
# web_port: 8081
126+
# deploy_dir: "/tidb-deploy/tispark-worker-7078"
127+
# java_home: "/usr/local/bin/java-1.8.0"
128+
- host: 10.0.1.23
129+
130+
monitoring_servers:
131+
- host: 10.0.1.10
132+
# ssh_port: 22
133+
# port: 9090
134+
# deploy_dir: "/tidb-deploy/prometheus-8249"
135+
# data_dir: "/tidb-data/prometheus-8249"
136+
# log_dir: "/tidb-deploy/prometheus-8249/log"
137+
138+
grafana_servers:
139+
- host: 10.0.1.10
140+
# port: 3000
141+
# deploy_dir: /tidb-deploy/grafana-3000
142+
143+
alertmanager_servers:
144+
- host: 10.0.1.10
145+
# ssh_port: 22
146+
# web_port: 9093
147+
# cluster_port: 9094
148+
# deploy_dir: "/tidb-deploy/alertmanager-9093"
149+
# data_dir: "/tidb-data/alertmanager-9093"
150+
# log_dir: "/tidb-deploy/alertmanager-9093/log"

Diff for: config-templates/simple-tispark.yaml

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# # Global variables are applied to all deployments and used as the default value of
2+
# # the deployments if a specific deployment value is missing.
3+
global:
4+
user: "tidb"
5+
ssh_port: 22
6+
deploy_dir: "/tidb-deploy"
7+
data_dir: "/tidb-data"
8+
9+
pd_servers:
10+
- host: 10.0.1.4
11+
- host: 10.0.1.5
12+
- host: 10.0.1.6
13+
14+
tidb_servers:
15+
- host: 10.0.1.1
16+
- host: 10.0.1.2
17+
- host: 10.0.1.3
18+
19+
tikv_servers:
20+
- host: 10.0.1.7
21+
- host: 10.0.1.8
22+
- host: 10.0.1.9
23+
24+
25+
# NOTE: TiSpark support is an experimental feature, it's not recommend to be used in
26+
# production at present.
27+
# To use TiSpark, you need to manually install Java Runtime Environment (JRE) 8 on the
28+
# host, see the OpenJDK doc for a reference: https://openjdk.java.net/install/
29+
# NOTE: Only 1 master node is supported for now
30+
tispark_masters:
31+
- host: 10.0.1.21
32+
33+
# NOTE: multiple worker nodes on the same host is not supported by Spark
34+
tispark_workers:
35+
- host: 10.0.1.22
36+
- host: 10.0.1.23
37+
38+
monitoring_servers:
39+
- host: 10.0.1.10
40+
41+
grafana_servers:
42+
- host: 10.0.1.10
43+
44+
alertmanager_servers:
45+
- host: 10.0.1.10

Diff for: production-deployment-using-tiup.md

+4
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,10 @@ The following topology documents provide a cluster configuration template for ea
9797

9898
This is to deploy TiDB Binlog along with the minimal cluster topology. TiDB Binlog is the widely used component for replicating incremental data. It provides near real-time backup and replication.
9999

100+
- [TiSpark deployment topology](/tispark-deployment-topology.md)
101+
102+
This is to deploy TiSpark along with the minimal cluster topology. TiSpark is a component built for running Apache Spark on top of TiDB/TiKV to answer the OLAP queries. Currently, TiUP cluster's support for TiSpark is still **experimental**.
103+
100104
- [Hybrid deployment topology](/hybrid-deployment-topology.md)
101105
102106
This is to deploy multiple instances on a single machine. You need to add extra configurations for the directory, port, resource ratio, and label.

Diff for: tispark-deployment-topology.md

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
title: TiSpark Deployment Topology
3+
summary: Learn the deployment topology of TiSpark using TiUP based on the minimal TiDB topology.
4+
---
5+
6+
# TiSpark Deployment Topology
7+
8+
> **Warning:**
9+
>
10+
> TiSpark support in the TiUP cluster is still an experimental feature. It is **NOT** recommended to use it in the production environment.
11+
12+
This document introduces the TiSpark deployment topology and how to deploy TiSpark based on the minimum cluster topology.
13+
14+
TiSpark is a component built for running Apache Spark on top of TiDB/TiKV to answer complex OLAP queries. It brings benefits of both the Spark platform and the distributed TiKV cluster to TiDB and makes TiDB a one-stop solution for both online transactions and analytics.
15+
16+
For more information about TiSpark, see [TiSpark User Guide](/tispark-overview.md).
17+
18+
## Topology information
19+
20+
| Instance | Count | Physical machine configuration | IP | Configuration |
21+
| :-- | :-- | :-- | :-- | :-- |
22+
| TiDB | 3 | 16 VCore 32GB * 1 | 10.0.1.1 <br/> 10.0.1.2 <br/> 10.0.1.3 | Default port <br/> Global directory configuration |
23+
| PD | 3 | 4 VCore 8GB * 1 |10.0.1.4 <br/> 10.0.1.5 <br/> 10.0.1.6 | Default port <br/> Global directory configuration |
24+
| TiKV | 3 | 16 VCore 32GB 2TB (nvme ssd) * 1 | 10.0.1.7 <br/> 10.0.1.8 <br/> 10.0.1.9 | Default port <br/> Global directory configuration |
25+
| TiSpark | 3 | 8 VCore 16GB * 1 | 10.0.1.21 (master) <br/> 10.0.1.22 (worker) <br/> 10.0.1.23 (worker) | Default port <br/> Global directory configuration |
26+
| Monitoring & Grafana | 1 | 4 VCore 8GB * 1 500GB (ssd) | 10.0.1.11 | Default port <br/> Global directory configuration |
27+
28+
## Topology templates
29+
30+
- [Simple TiSpark topology template](/config-templates/simple-tispark.yaml)
31+
- [Complex TiSpark topology template](/config-templates/complex-tispark.yaml)
32+
33+
> **Note:**
34+
>
35+
> - You do not need to manually create the `tidb` user in the configuration file. The TiUP cluster component automatically creates the `tidb` user on the target machines. You can customize the user, or keep the user consistent with the control machine.
36+
> - If you configure the deployment directory as a relative path, the cluster will be deployed in the home directory of the user.
37+
38+
## Prerequisites
39+
40+
TiSpark is based on the Apache Spark cluster, so before you start the TiDB cluster that contains TiSpark, you must ensure that Java Runtime Environment (JRE) 8 is installed on the server that deploys TiSpark. Otherwise, TiSpark cannot be started.
41+
42+
TiUP does not support installing JRE automatically. You need to install it on your own. For detailed installation instruction, see [How to download and install prebuilt OpenJDK packages](https://openjdk.java.net/install/).
43+
44+
If JRE 8 has already been installed on the deployment server but is not in the path of the system's default package management tool, you can specify the path of the JRE environment to be used by setting the `java_home` parameter in the topology configuration. This parameter corresponds to the `JAVA_HOME` system environment variable.

Diff for: tiup/tiup-cluster.md

+4
Original file line numberDiff line numberDiff line change
@@ -430,6 +430,10 @@ tiup cluster patch test-cluster /tmp/tidb-hotfix.tar.gz -N 172.16.4.5:4000
430430

431431
## Import TiDB Ansible cluster
432432

433+
> **Note:**
434+
>
435+
> Currently, TiUP cluster's support for TiSpark is still **experimental**. It is not supported to import a TiDB cluster with TiSpark enabled.
436+
433437
Before TiUP is released, TiDB Ansible is often used to deploy TiDB clusters. To enable TiUP to take over the cluster deployed by TiDB Ansible, use the `import` command.
434438

435439
The usage of the `import` command is as follows:

0 commit comments

Comments
 (0)