Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-7028][CH][Part-7] Support one pipeline write for mergetree #7788

Merged
merged 8 commits into from
Nov 10, 2024

Conversation

baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Nov 3, 2024

What changes were proposed in this pull request?

This PR implements one pipeline write for mergetree:

  1. Since ColumnBatch is returned in pipeline write, we now pass string instead of Map from C++ to java to avoid complex type, and we can utilize spark's utility to parse partition which is more compatibility.
  2. Using BasicWriteJobStatsTracker to create `BasicWriteTaskStats. which compelete previous work.
  3. Fix header mismatch issue: we need to get header from pipeline builder instead of query plan, since query plan only contains read rel output header.
  4. [GLUTEN-7028][CH][Part-5] Refactor: add NativeOutputWriter to unify CHDatasourceJniWrapper #7395 introduced message write, we now use it in the pipeline mode for both mergetree and files(parqet, orc).
  5. Introducing RutimeSettings and RuntimeConfig
  6. Writing mergetree in one pipeline.

More details on RutimeSettings and RuntimeConfig

To avoid harded code in product code, We now start to define type safed ConfigEntry[T] for configuration and setting key, For example:

      .setCHConfig("path", UTSystemParameters.diskOutputDataPath)
      .setCHConfig("tmp_path", s"/tmp/libch/$SPARK_DIR_NAME")

will be changed to:

      .set(RuntimeConfig.PATH.key, UTSystemParameters.diskOutputDataPath)
      .set(RuntimeConfig.TMP_PATH.key, s"/tmp/libch/$SPARK_DIR_NAME")

(Fixes: #7028)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

github-actions bot commented Nov 3, 2024

#7028

Copy link

github-actions bot commented Nov 3, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from 968ca64 to 06d559f Compare November 3, 2024 13:51
Copy link

github-actions bot commented Nov 3, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from 06d559f to cd055cc Compare November 4, 2024 06:29
Copy link

github-actions bot commented Nov 4, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from cd055cc to 88e7262 Compare November 4, 2024 08:45
Copy link

github-actions bot commented Nov 4, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from 88e7262 to 367d35d Compare November 5, 2024 13:39
Copy link

github-actions bot commented Nov 5, 2024

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from 367d35d to 2d45aca Compare November 6, 2024 10:38
Copy link

github-actions bot commented Nov 6, 2024

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from 2d45aca to 924aaec Compare November 8, 2024 07:38
Copy link

github-actions bot commented Nov 8, 2024

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from 924aaec to 293b20c Compare November 8, 2024 16:25
Copy link

github-actions bot commented Nov 8, 2024

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

github-actions bot commented Nov 8, 2024

Run Gluten Clickhouse CI on x86

…ine builder instead of query plan, since query plan only contains read rel output header.
[Improve] Using callback to update config without copying config
[Refactor] NativeExpressionEvaluator and refactor Java_org_apache_spark_sql_execution_datasources_CHDatasourceJniWrapper_createMergeTreeWriter
[Refactor] Simplify the logic of evaluating tmp_path
[Refactor] Remove Java_org_apache_gluten_vectorized_ExpressionEvaluatorJniWrapper_injectWriteFilesTempPath
@baibaichen baibaichen force-pushed the feature/onepipeline2 branch from 2aff749 to 3277924 Compare November 9, 2024 06:14
Copy link

github-actions bot commented Nov 9, 2024

Run Gluten Clickhouse CI on x86

@baibaichen
Copy link
Contributor Author

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen merged commit be0435a into apache:main Nov 10, 2024
13 checks passed
@baibaichen baibaichen changed the title [GLUTEN-7028][CH][Part-7] Feature/onepipeline2 [GLUTEN-7028][CH][Part-7] Support one pipeline write for mergetree Nov 10, 2024
@baibaichen baibaichen deleted the feature/onepipeline2 branch November 10, 2024 05:17
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_master_11_10_2024_time.csv log/native_master_11_09_2024_76b375d998_time.csv difference percentage
q1 14.21 15.21 0.999 107.03%
q2 15.64 13.80 -1.841 88.23%
q3 5.06 5.19 0.127 102.50%
q4 72.00 72.95 0.956 101.33%
q5 10.67 11.39 0.722 106.77%
q6 3.50 2.21 -1.293 63.06%
q7 8.80 8.47 -0.334 96.21%
q8 5.63 3.30 -2.331 58.59%
q9 26.21 26.66 0.442 101.69%
q10 7.68 8.84 1.160 115.11%
q11 38.28 38.64 0.364 100.95%
q12 1.36 1.34 -0.018 98.71%
q13 7.88 6.41 -1.469 81.36%
q14a 50.16 51.06 0.908 101.81%
q14b 43.28 41.55 -1.732 96.00%
q15 2.54 2.52 -0.013 99.50%
q16 49.03 49.37 0.335 100.68%
q17 5.10 4.82 -0.285 94.41%
q18 6.78 7.06 0.280 104.14%
q19 2.48 2.06 -0.411 83.38%
q20 1.37 1.45 0.079 105.74%
q21 1.13 1.22 0.084 107.40%
q22 8.12 8.33 0.202 102.49%
q23a 105.75 105.15 -0.603 99.43%
q23b 128.98 129.49 0.513 100.40%
q24a 105.78 114.30 8.520 108.05%
q24b 104.64 113.86 9.217 108.81%
q25 4.02 4.05 0.027 100.67%
q26 3.67 4.12 0.445 112.11%
q27 4.37 5.00 0.626 114.32%
q28 33.19 32.49 -0.700 97.89%
q29 10.65 10.91 0.259 102.43%
q30 4.67 5.02 0.349 107.48%
q31 7.16 7.22 0.064 100.89%
q32 1.25 1.21 -0.037 97.02%
q33 4.54 4.47 -0.075 98.35%
q34 3.91 3.94 0.030 100.76%
q35 9.88 8.50 -1.386 85.98%
q36 6.49 5.45 -1.045 83.91%
q37 5.19 5.02 -0.174 96.65%
q38 13.80 16.36 2.555 118.52%
q39a 3.14 3.27 0.122 103.88%
q39b 2.83 3.20 0.369 113.03%
q40 4.04 5.68 1.646 140.79%
q41 0.60 0.67 0.064 110.68%
q42 0.99 0.90 -0.086 91.30%
q43 4.69 4.84 0.147 103.13%
q44 10.44 9.29 -1.145 89.03%
q45 3.33 3.23 -0.103 96.89%
q46 4.03 3.94 -0.082 97.97%
q47 18.63 17.79 -0.843 95.48%
q48 5.01 5.13 0.126 102.53%
q49 8.18 7.42 -0.758 90.72%
q50 22.32 22.48 0.159 100.71%
q51 10.59 9.39 -1.203 88.64%
q52 1.14 1.07 -0.070 93.87%
q53 2.50 2.44 -0.061 97.56%
q54 3.94 3.67 -0.276 92.99%
q55 1.12 1.13 0.012 101.06%
q56 4.04 3.98 -0.056 98.60%
q57 10.42 10.51 0.084 100.80%
q58 2.51 2.47 -0.035 98.61%
q59 11.16 12.05 0.884 107.92%
q60 4.09 4.20 0.108 102.63%
q61 4.07 4.04 -0.032 99.21%
q62 4.70 5.20 0.504 110.72%
q63 3.48 2.45 -1.032 70.31%
q64 61.17 66.31 5.134 108.39%
q65 17.01 17.77 0.758 104.46%
q66 4.28 6.50 2.212 151.62%
q67 432.65 432.62 -0.033 99.99%
q68 3.87 3.91 0.040 101.03%
q69 7.34 5.50 -1.841 74.94%
q70 11.19 11.22 0.032 100.28%
q71 2.39 2.55 0.164 106.84%
q72 210.80 214.04 3.240 101.54%
q73 2.31 2.30 -0.003 99.85%
q74 24.25 22.67 -1.581 93.48%
q75 26.41 25.27 -1.141 95.68%
q76 13.50 12.32 -1.179 91.26%
q77 2.20 1.94 -0.266 87.91%
q78 49.87 49.53 -0.342 99.31%
q79 4.03 3.95 -0.078 98.06%
q80 11.35 11.82 0.475 104.19%
q81 4.68 4.61 -0.066 98.58%
q82 6.86 8.34 1.480 121.57%
q83 1.65 1.58 -0.076 95.42%
q84 2.93 3.04 0.112 103.83%
q85 6.67 7.82 1.150 117.24%
q86 4.36 4.32 -0.039 99.11%
q87 13.57 13.83 0.259 101.91%
q88 18.35 18.22 -0.133 99.27%
q89 3.07 3.08 0.010 100.31%
q90 2.91 3.73 0.822 128.26%
q91 2.00 2.44 0.444 122.25%
q92 1.20 1.22 0.025 102.09%
q93 40.59 42.85 2.259 105.57%
q94 27.01 26.79 -0.219 99.19%
q9 92.59 90.39 -2.199 97.63%
q5 2.72 2.75 0.031 101.16%
q96 18.04 17.96 -0.077 99.57%
q97 1.99 1.79 -0.201 89.90%
q98 10.40 11.26 0.869 108.36%
q99 10.40 11.26 0.869 108.36%
total 2243.07 2267.07 23.997 101.07%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Fully Support writing parquet and mergetree in spark 3.5.x with delta protocol
3 participants