Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][storage] Store Traces in ClickHouse Based on Jaeger V2 #6725

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

zzzk1
Copy link
Contributor

@zzzk1 zzzk1 commented Feb 13, 2025

Which problem is this PR solving?

Desgin doc: Jaeger V2: Support for ClickHouse as Storage Backend
Part of #5058

Description of the changes

  • Introduce clickhouse client: ch-go & clickhouse-go v2 to be used for writing/reading traces
  • Provide test container environment for ClickHouse

How was this change tested?

  • unit tests & intergation tests

Checklist

@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch from a65794d to bacbf97 Compare February 13, 2025 13:51
Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 87.90850% with 37 lines in your changes missing coverage. Please review.

Project coverage is 95.71%. Comparing base (4b884bb) to head (4293f5f).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
pkg/clickhouse/wrapper/wrapper.go 42.85% 15 Missing and 1 partial ⚠️
pkg/clickhouse/config/config.go 81.25% 8 Missing and 4 partials ⚠️
pkg/clickhouse/internal/traces.go 96.47% 4 Missing and 2 partials ⚠️
internal/storage/v2/clickhouse/factory.go 84.21% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6725      +/-   ##
==========================================
- Coverage   96.03%   95.71%   -0.32%     
==========================================
  Files         364      369       +5     
  Lines       20675    21002     +327     
==========================================
+ Hits        19855    20102     +247     
- Misses        626      700      +74     
- Partials      194      200       +6     
Flag Coverage Δ
badger_v1 9.47% <0.00%> (-0.30%) ⬇️
badger_v2 1.82% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v1-manual 14.37% <0.00%> (-0.45%) ⬇️
cassandra-4.x-v2-auto 1.81% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v2-manual 1.81% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v1-manual 14.37% <0.00%> (-0.45%) ⬇️
cassandra-5.x-v2-auto 1.81% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v2-manual 1.81% <0.00%> (-0.01%) ⬇️
clickhouse-25.x-v2 1.12% <28.75%> (?)
elasticsearch-6.x-v1 18.58% <0.00%> (-0.57%) ⬇️
elasticsearch-7.x-v1 18.66% <0.00%> (-0.58%) ⬇️
elasticsearch-8.x-v1 18.82% <0.00%> (-0.58%) ⬇️
elasticsearch-8.x-v2 1.82% <0.00%> (-0.01%) ⬇️
grpc_v1 10.49% <0.00%> (-0.33%) ⬇️
grpc_v2 7.80% <0.00%> (-0.01%) ⬇️
kafka-3.x-v1 ?
kafka-3.x-v2 ?
memory_v2 1.82% <0.00%> (-0.01%) ⬇️
opensearch-1.x-v1 18.71% <0.00%> (-0.58%) ⬇️
opensearch-2.x-v1 18.71% <0.00%> (-0.58%) ⬇️
opensearch-2.x-v2 1.82% <0.00%> (-0.01%) ⬇️
tailsampling-processor 0.48% <0.00%> (-0.01%) ⬇️
unittests 94.51% <65.68%> (-0.42%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch from bacbf97 to 21c3fab Compare February 13, 2025 14:30
@zzzk1 zzzk1 changed the title [feat][storage]: add write path for ClickHouse based on Jaeger V2 [feat][storage] add write path for ClickHouse based on Jaeger V2 Feb 13, 2025
@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch 5 times, most recently from a16370b to 72d91cc Compare February 14, 2025 07:24
@zzzk1
Copy link
Contributor Author

zzzk1 commented Feb 14, 2025

@yurishkuro ClickHouse integration test not working in CI. Is there anything I might have missed?

@@ -0,0 +1,58 @@
// Copyright (c) 2025 The Jaeger Authors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these methods require a reliable connection to the ClickHouse server; they should not be unit tested.

@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch from 72d91cc to 51b3f8f Compare February 14, 2025 10:52
@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch from 51b3f8f to b8915ef Compare February 14, 2025 10:52
@zzzk1 zzzk1 marked this pull request as ready for review February 14, 2025 11:00
@zzzk1 zzzk1 requested a review from a team as a code owner February 14, 2025 11:00
@zzzk1 zzzk1 requested a review from albertteoh February 14, 2025 11:00
@dosubot dosubot bot added area/storage docker Pull requests that update Docker code v2 labels Feb 14, 2025
@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch from 572c1a2 to 7718d65 Compare February 15, 2025 11:31
@zzzk1 zzzk1 changed the title [feat][storage] add write path for ClickHouse based on Jaeger V2 [feat][storage] Store Traces in ClickHouse Based on Jaeger V2 Feb 15, 2025
Signed-off-by: zzzk1 <[email protected]>
@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch from 7718d65 to 19bc811 Compare February 15, 2025 12:39
Signed-off-by: zzzk1 <[email protected]>
@zzzk1 zzzk1 force-pushed the write-path-for-clickhouse branch from 19bc811 to c408a25 Compare February 15, 2025 12:43
Signed-off-by: zzzk1 <[email protected]>
@zzzk1
Copy link
Contributor Author

zzzk1 commented Feb 15, 2025

I would like to implement all basic features, ensure that the integration tests pass, and then return to address other low-priority tasks.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before moving code around I suggest you read & understand the comments and then propose a new directory structure that we can agree on. This will reduce the churn.

workflow_call:

concurrency:
group: cit-kafka-${{ github.workflow }}-${{ (github.event.pull_request && github.event.pull_request.number) || github.ref || github.run_id }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
group: cit-kafka-${{ github.workflow }}-${{ (github.event.pull_request && github.event.pull_request.number) || github.ref || github.run_id }}
group: cit-clickhouse-${{ github.workflow }}-${{ (github.event.pull_request && github.event.pull_request.number) || github.ref || github.run_id }}

strategy:
fail-fast: false
matrix:
jaeger-version: [v2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't plan to support CH in v1, so this doesn't need to be part of the matrix

run: bash scripts/e2e/clickhouse.sh

- uses: ./.github/actions/verify-metrics-snapshot
if: matrix.jaeger-version == 'v2'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always true

ports:
- 9000:9000
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are trying to move away from having external steps for setting up database - Jaeger should be able to run when started against a blank database. That means the schema creation logic should be embedded in the implementation (similar to #5922).

}

func ignoreStartAutoCloseIdleConnections() goleak.Option {
return goleak.IgnoreTopFunction("github.com/ClickHouse/clickhouse-go/v2.(*clickhouse).startAutoCloseIdleConnections")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding exclusions here should be the last resort. Are there known upstream bugs that prevent clean shutdown of these goroutines?

if c.Pool != nil {
c.Pool.Close()
}
if c.Conn != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would it be nil?

// Copyright (c) 2025 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package wrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/clickhouse/wrapper/wrapper.go

please keep everything CH-related under internal/storage/v2/clickhouse

client clickhouse.Client
}

func NewFactoryWithConfig(configuration *config.Configuration, logger *zap.Logger) (*Factory, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func NewFactoryWithConfig(configuration *config.Configuration, logger *zap.Logger) (*Factory, error) {
func NewFactory(cfg *config.Configuration, logger *zap.Logger) (*Factory, error) {

}
}

func (c *Configuration) NewClient(logger *zap.Logger) (clickhouse.Client, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be moved to a separate package (perhaps close to wrapper) so that config logic can be 100% testable and not dependent on runtime connections to db.

testutils.VerifyGoLeaksOnce(t)
})
s := &ClickhouseIntegrationTestSuite{}
s.initialize(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't actually run the test?

@zzzk1
Copy link
Contributor Author

zzzk1 commented Feb 18, 2025

@yurishkuro The key here is that the design of the ch-go and clickhouse-go clients are very different, making it difficult to abstract them. Therefore, it's best to isolate them and use the minimal configuration file required for initialization as follows:

schema:
  #auto run DDL script
  auto: true
client:
  database: jaeger
  username: default
  password: default
  #ch-go
  writer:
    address: "127.0.0.1:9200"
    pool:
      max_connection_lifetime: 3600000000000
      max_connection_idle_time: 1800000000000
      #CPU Core number
      min_connections: 4
      #CPU Core number * 2
      max_connections: 8
      health_check_period: 60000000000
  #clickhouse-go
  reader:
    #no cluster just a different field here.
    addresses: ["node00:9200","node01:9200","node02:9200"]

The directory structure has been adjusted as follows:

internal/storage/v2/clickhouse/

  • client
    • conn Create a clickhouse-go connection.
    • pool Create ch-go connection pool.
  • config All configuration items correspond to the configuration file.
  • internal Tools required to write traces into the database.
  • wrapper Wrapper to isolate upper-layer calls from third-party implementations.
  • schema Initialization DLL scripts and providing automatic database initialization functionality.
  • tracestore Implement read and write operations for traces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage docker Pull requests that update Docker code v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants