|
1 | 1 | [English](./getting-started.md)|[中文](../zh/reference/getting-started.md) |
| 2 | +### Tutorials |
2 | 3 |
|
3 | | -## Getting started |
| 4 | +包含启动项目和运行范例的流程引导 |
| 5 | + |
| 6 | +#### Setup on Linux\MacOS |
| 7 | + |
| 8 | +在Linux\MacOS上 运行Quicksql非常简单,但需要确保环境预置完整,依赖的环境有: |
| 9 | + |
| 10 | +· Java >= 1.8 |
| 11 | + |
| 12 | +· Spark >= 2.2 (必选,未来作为可选) |
| 13 | + |
| 14 | +· Flink >= 1.9 (可选) |
| 15 | + |
| 16 | +1. 下载并解压二进制安装包,下载地址:<https://github.com/Qihoo360/Quicksql/releases>; |
| 17 | +2. 进入conf目录,在quicksql-env.sh中配置环境变量; |
| 18 | + |
| 19 | +``````shell |
| 20 | +$ tar -zxvf ./quicksql-release-bin.tar.gz |
| 21 | +$ cd quicksql-realease-0.7.0 |
| 22 | +$ vim ./conf/quicksql-env.sh #Set Your Basic Environment. |
| 23 | +`````` |
| 24 | + |
| 25 | +##### 运行样例查询 |
| 26 | + |
| 27 | +进入bin目录,执行quicksql-example脚本。(这里使用了内嵌Elasticsearch Server与Csv数据源作一个关联过滤) |
| 28 | + |
| 29 | +``````shell |
| 30 | +$ ./bin/quicksql-example com.qihoo.qsql.CsvJoinWithEsExample #换成选项型,并能打印SQL语句 |
| 31 | +`````` |
| 32 | + |
| 33 | +如果能够显示以下结果,说明环境构建完毕,可以尝试新的操作。 |
| 34 | + |
| 35 | +```sql |
| 36 | ++------+-------+----------+--------+------+-------+------+ |
| 37 | +|deptno| name| city|province|digest| type|stu_id| |
| 38 | ++------+-------+----------+--------+------+-------+------+ |
| 39 | +| 40|Scholar| BROCKTON| MA| 59498|Scholar| null| |
| 40 | +| 45| Master| CONCORD| NH| 34035| Master| null| |
| 41 | +| 40|Scholar|FRAMINGHAM| MA| 65046|Scholar| null| |
| 42 | ++------+-------+----------+--------+------+-------+------+ |
| 43 | +``` |
| 44 | + |
| 45 | +##### 运行真实查询 |
| 46 | + |
| 47 | +在Quicksql上运行查询前需要将连接信息以及表、字段信息采集入库。 |
| 48 | + |
| 49 | +默认元数据库使用Sqlite,切换元数据库的方式参考部署指南,Sqlite可以使用以下方式访问: |
| 50 | + |
| 51 | +``````shell |
| 52 | +$ cd ./metastore/linux-x86/ |
| 53 | +$ sqlite3 ../schema.db |
| 54 | +SQLite version 3.6.20 |
| 55 | +Enter ".help" for instructions |
| 56 | +Enter SQL statements terminated with a ";" |
| 57 | +sqlite> .tables |
| 58 | +COLUMNS DATABASE_PARAMS DBS TBLS |
| 59 | +sqlite> SELECT TBLS.DB_ID, TBL_NAME, NAME FROM TBLS INNER JOIN DBS ON TBLS.DB_ID = DBS.DB_ID; |
| 60 | ++------+---------------+-----------+ |
| 61 | +| DB_ID| TBL_NAME| DB_NAME| |
| 62 | ++------+---------------+-----------+ |
| 63 | +| 1| call_center| BROCKTON| |
| 64 | +| 2| catalog_page| CONCORD| |
| 65 | +| 3| catalog_sales| FRAMINGHAM| |
| 66 | ++------+---------------+-----------+ |
| 67 | +`````` |
| 68 | + |
| 69 | +当然,我们并不需要手工去插入元数据! |
| 70 | + |
| 71 | +Quicksql提供了众多标准数据源的采集脚本,通过脚本批量拉取元数据。 |
| 72 | + |
| 73 | +目前支持通过脚本录入元数据的数据源有**Hive, MySQL, Kylin, Elasticsearch, Oracle, MongoDB**。 |
| 74 | + |
| 75 | +执行方式如下(注意:-r 参数可以使用LIKE语法,['%': 全部匹配,'_': 占位匹配,'?': 可选匹配]) |
| 76 | + |
| 77 | +``````shell |
| 78 | +$ ./bin/metadata-extract -p "<SCHEMA-JSON>" -d "<DATA-SOURCE>" -r "<TABLE-NAME-REGEX>" |
| 79 | +`````` |
| 80 | + |
| 81 | +(详细的SCHEMA-JSON格式参考页末) |
| 82 | + |
| 83 | +**使用示例** |
| 84 | + |
| 85 | +从**MySQL**数据库中采集元数据 |
| 86 | + |
| 87 | +``````shell |
| 88 | +$ ./meta-extract -p "{\"jdbcDriver\": \"com.mysql.jdbc.Driver\", \"jdbcUrl\": \"jdbc:mysql://localhost:3306/db\", \"jdbcUser\": \"user\",\"jdbcPassword\": \"pass\"}" -d "mysql" -r "my_table" |
| 89 | +`````` |
| 90 | + |
| 91 | +从**Elasticsearch**存储中采集元数据 |
| 92 | + |
| 93 | +``````shell |
| 94 | +$ ./meta-extract -p "{\"esNodes\": \"192.168.1.1\",\"esPort\": \"9090\",\"esUser\": \"user\",\"esPass\": \"pass\",\"esIndex\": \"index/type\"}" -d "es" -r "%" |
| 95 | +`````` |
| 96 | + |
| 97 | +采集成功后将返回 |
| 98 | + |
| 99 | +```shell |
| 100 | +1970-01-01 15:09:43,119 [main] INFO - Connecting server..... |
| 101 | +1970-01-01 15:09:44,000 [main] INFO - Connected successfully!! |
| 102 | +1970-01-01 15:09:44,121 [main] INFO - Successfully collected metadata for 2 tables!! |
| 103 | +1970-01-01 15:09:45,622 [main] INFO - [my_table, my_type]!! |
| 104 | +``` |
| 105 | + |
| 106 | +**连接信息** |
| 107 | + |
| 108 | +常见数据源采集的JSON结构如下 |
| 109 | + |
| 110 | +``````shell |
| 111 | +##MySQL |
| 112 | +{ |
| 113 | + "jdbcDriver": "com.mysql.jdbc.Driver", |
| 114 | + "jdbcUrl": "jdbc:mysql://localhost:3306/db", |
| 115 | + "jdbcUser": "USER", |
| 116 | + "jdbcPassword": "PASSWORD" |
| 117 | +} |
| 118 | +##Oracle |
| 119 | +{ |
| 120 | + "jdbcDriver": "oracle.jdbc.driver.OracleDriver", |
| 121 | + "jdbcUrl": "jdbc:oracle:thin:@localhost:1521/namespace", |
| 122 | + "jdbcUser": "USER", |
| 123 | + "jdbcPassword": "PASSWORD" |
| 124 | +} |
| 125 | +##Elasticsearch |
| 126 | +{ |
| 127 | + "esNodes": "192.168.1.1", |
| 128 | + "esPort": "9000", |
| 129 | + "esUser": "USER", |
| 130 | + "esPass": "PASSWORD", |
| 131 | + "esIndex": "index/type" |
| 132 | +} |
| 133 | +##Hive(Hive元数据存在MySQL中) |
| 134 | +{ |
| 135 | + "jdbcDriver": "com.mysql.jdbc.Driver", |
| 136 | + "jdbcUrl": "jdbc:mysql://localhost:3306/db", |
| 137 | + "jdbcUser": "USER", |
| 138 | + "jdbcPassword": "PASSWORD", |
| 139 | + "dbName": "hive_db" |
| 140 | +} |
| 141 | +##Hive-Jdbc(Hive元数据通过Jdbc访问 ) |
| 142 | +{ |
| 143 | + "jdbcDriver": "org.apache.hive.jdbc.HiveDriver", |
| 144 | + "jdbcUrl": "jdbc:hive2://localhost:7070/learn_kylin", |
| 145 | + "jdbcUser": "USER", |
| 146 | + "jdbcPassword": "PASSWORD", |
| 147 | + "dbName": "default" |
| 148 | +} |
| 149 | +##Kylin |
| 150 | +{ |
| 151 | + "jdbcDriver": "org.apache.kylin.jdbc.Driver", |
| 152 | + "jdbcUrl": "jdbc:kylin://localhost:7070/learn_kylin", |
| 153 | + "jdbcUser": "ADMIN", |
| 154 | + "jdbcPassword": "KYLIN", |
| 155 | + "dbName": "default" |
| 156 | +} |
| 157 | +`````` |
| 158 | + |
| 159 | +注意:Shell中双引号是特殊字符,传JSON参数时需要做转义!! |
| 160 | + |
| 161 | +##### 第二页 从命令行提交查询 |
| 162 | + |
| 163 | +从命令行查询是Quicksql提供的最基本的查询方式之一。 |
| 164 | + |
| 165 | +像Hive和MySQL一样,使用`quicksql.sh -e "YOUR SQL"`就可以完成查询,结果集将打印在终端上。 |
| 166 | + |
| 167 | +**使用示例** |
| 168 | + |
| 169 | +1. 一个简单的查询,将在Quicksql内核中被执行; |
| 170 | + |
| 171 | +``````shell |
| 172 | +$ ./bin/quicksql.sh -e "SELECT 1" |
| 173 | +`````` |
| 174 | + |
| 175 | +想让它跑在Spark或Flink计算引擎上?可以使用runner参数; |
| 176 | + |
| 177 | +``````shell |
| 178 | +$ ./bin/quicksql.sh -e "SELECT 1" --runner spark|flink |
| 179 | +`````` |
| 180 | + |
| 181 | +2. 一个Elasticsearch数据源查询,将由Quicksql建立RestClient连接执行; |
| 182 | + |
| 183 | +``````shell |
| 184 | +$ ./bin/quicksql.sh -e "SELECT approx_count_distinct(city), state FROM geo_mapping GROUP BY state LIMIT 10" |
| 185 | +`````` |
| 186 | + |
| 187 | +想让计算结果落地到存储?可以尝试INSERT INTO语法: |
| 188 | + |
| 189 | +``````shell |
| 190 | +$ ./bin/quicksql.sh -e "INSERT INTO \`hdfs://cluster:9000/hello/world\` IN HDFS SELECT approx_count_distinct(city), state FROM geo_mapping GROUP BY state LIMIT 10" |
| 191 | +`````` |
| 192 | + |
| 193 | +**其他参数** |
| 194 | + |
| 195 | +以上实例提供了基本的查询方式,如果对计算引擎需要指定其他参数可以参考下表: |
| 196 | + |
| 197 | +| Property Name | Default | Meaning | |
| 198 | +| --------------- | ----------- | ----------------------------------------------- | |
| 199 | +| -e | -- | 配置查询的SQL语句,查询时必填。 | |
| 200 | +| -h\|--help | -- | 命令参数的详细描述 | |
| 201 | +| --runner | dynamic | 设置执行器类型,包括 dynamic, jdbc, spark, flink | |
| 202 | +| --master | yarn-client | 设置引擎执行模式 | |
| 203 | +| --worker_memory | 1G | 执行器的内存大小配置 | |
| 204 | +| --driver_memory | 3G | 控制器的内存大小配置 | |
| 205 | +| --worker_num | 20 | 执行器的并行度 | |
| 206 | + |
| 207 | +注意: |
| 208 | + |
| 209 | + (1) 在quicksql-env.sh 中可以设置runner、master、worker_memory等参数的默认值; |
| 210 | + |
| 211 | + (2) 在非分布式执行中,即使设置了master、worker_memory等参数也不会生效; |
| 212 | + |
| 213 | +##### 第三页 从应用提交查询 |
| 214 | + |
| 215 | +Quicksql支持使用Client/Server模式的JDBC连接进行查询,用户的应用可以通过引入Driver包与Server建立连接进行联邦查询。 |
| 216 | + |
| 217 | +#### Server端 |
| 218 | + |
| 219 | +**启动Server** |
| 220 | + |
| 221 | +``````shell |
| 222 | +$ ./bin/quicksql-server.sh start -P 5888 -R spark -M yarn-client |
| 223 | +`````` |
| 224 | + |
| 225 | +启动参数包括start|stop|restart|status,-P/-R/-M为可选项,分别对应端口号,执行引擎和任务调度方式, |
| 226 | + |
| 227 | +-P:指定server端口号,默认为5888 |
| 228 | + |
| 229 | +-R:指定执行引擎,支持Spark/Flink |
| 230 | + |
| 231 | +-M:指定spark任务资源调度方式,yarn-client或yarn-cluster等,默认为local[1] |
| 232 | + |
| 233 | +#### Client端 |
| 234 | + |
| 235 | +**应用接入** |
| 236 | + |
| 237 | +项目poml文件引入quicksql-client和 avatica 依赖包 |
| 238 | + |
| 239 | +``` |
| 240 | +<dependency> |
| 241 | + <groupId>com.qihoo.qsql</groupId> |
| 242 | + <artifactId>qsql</artifactId> |
| 243 | + <version>0.6</version> |
| 244 | +</dependency> |
| 245 | +<dependency> |
| 246 | + <groupId>org.apache.calcite.avatica</groupId> |
| 247 | + <artifactId>avatica-server</artifactId> |
| 248 | + <version>1.12.0</version> |
| 249 | +</dependency> |
| 250 | +``` |
| 251 | + |
| 252 | +Java代码示例: |
| 253 | + |
| 254 | +```java |
| 255 | + public static void main(String[] args) throws SQLException, ClassNotFoundException { |
| 256 | + Class.forName("com.qihoo.qsql.client.Driver"); //注入Drvier |
| 257 | + |
| 258 | + Properties properties = new Properties(); |
| 259 | + properties.setProperty("runner","jdbc"); |
| 260 | + String url = "jdbc:quicksql:url=http://localhost:5888"; |
| 261 | + Connection connection = DriverManager.getConnection(url,properties); |
| 262 | + Statement pS = connection.createStatement(); |
| 263 | + String sql = "select * from (values ('a', 1), ('b', 2))"; |
| 264 | + ResultSet rs = pS.executeQuery(sql); |
| 265 | + while (rs.next()) { |
| 266 | + System.out.println(rs.getString(1)); |
| 267 | + System.out.println(rs.getString(2)); |
| 268 | + } |
| 269 | + rs.close(); |
| 270 | + pS.close(); |
| 271 | +} |
| 272 | +``` |
| 273 | + |
| 274 | +1. 注入quicksql Driver :com.qihoo.qsql.client.Driver |
| 275 | + |
| 276 | +2. properties 配置项包含参数 |
| 277 | + |
| 278 | + runner:指定执行引擎, 包括 dynamic, jdbc, spark, flink |
| 279 | + |
| 280 | + acceptedResultsNum : 执行查询返回数据的最大条数 |
| 281 | + |
| 282 | + appName:启动的spark/flink实例名 |
| 283 | + |
| 284 | +3. 连接server的url : jdbc:quicksql:url=http:// + server服务器域名或ip地址 + server启动端口号(在server的日志文件 里有url信息) |
| 285 | + |
| 286 | +4. 其他操作与普通jdbc查询相同,包括Connection, Statement,ResultSet,ResultSetMetaData等类的操作,以及结果的遍历。 |
0 commit comments