Skip to content

Commit

Permalink
feat: Update Team ClickHouse Q3 Goals (#8841)
Browse files Browse the repository at this point in the history
  • Loading branch information
fuziontech authored Jun 28, 2024
1 parent 3f04653 commit 64b2c23
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 40 deletions.
23 changes: 0 additions & 23 deletions contents/handbook/engineering/clickhouse/clusters.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -291,26 +291,3 @@ Meaning that on top of the baseline speed to NVMe disk we can tier out to EBS an
Currently US is entirely on NVMe backed nodes. EU will need to be migrated to this setup as well.


## Future work

**Objectives:**
- All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another.
- We want all cluster operations to be automated and managed through some form of infra as code that is available in source control.
- Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas.
- We should be able to spin up and down replicas of any cluster with no manual intervention.
- We should be able to upgrade ClickHouse versions with no manual intervention.
- We should have tooling / runbooks for resharding (if we continue down the current coordinator path)


**Tasks:**
- Move to NVMe backed nodes for EU
- Move to NVMe backed nodes for Coordinator
- Move to NVMe backed nodes for Dev
- Infra as code for ClickHouse
- Configs in Ansible for ClickHouse
- Upgrade US to `im4gn.16xlarge` instances in one AZ
- Setup Coordinators for multi-tennant workloads
- Move to Table defined TTL based tiered storage
- Move to ClickHouseKeeper from ZooKeeper


2 changes: 1 addition & 1 deletion contents/teams/clickhouse/mission.mdx
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Ensure ClickHouse runs real good.
Build a safe, fast, and sustainable place for our user's data.
51 changes: 35 additions & 16 deletions contents/teams/clickhouse/objectives.mdx
Original file line number Diff line number Diff line change
@@ -1,17 +1,36 @@
James as a Service -> Clickhouse as a Service
**Decide on the future of where we store our data and work towards that**

* **Key Results:**
* Better visibility
* Regularly testing backups
* Monitoring/alerting
* Mutations
* Moves
* Management
* Managing/killing mutations
* Self serve
* Schema design feedback (Jams non blocking)
* Schema management
* Automation
* Replace/Upgrade replicas
* Upgrading to 24.04
* Disk configs
ByConity is an interesting alternative to CH that incorporates a few features that we require:
- Rules and cost based query optimizer
- Separation of Storage and Compute
- Virtual Warehouses

A major goal for us this quarter is to evaluate the feasibility of this migration.


**List of all goals**

- Support P0 tasks such as
- Deletes
- Keeping clusters happy
- provisioning more disks
- Schema Reviews
- Debugging
- Performance
- Backups/Restores
- Decide whether ByConity is the way forward
- Load it with data, set up
- Test performance, test the functionality/compatibility gaps
- Branch in goals depending on ByConity feasibility
- If ByConity works, migrate over to it
- Enumerate all functionality that doesn’t work and update the functions/contribute to ByConity
- Syntax
- If it works on metal, put it in k8s with Karpenter
- Evaluate which nodes we should use
- If ByConity doesn’t work, reshard US to look like EU cluster
- All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another.
- We want all cluster operations to be automated and managed through some form of infra as code that is available in source control.
- Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas.
- We should be able to spin up and down replicas of any cluster with no manual intervention.
- We should be able to upgrade ClickHouse versions with no manual intervention.
- We should have tooling / runbooks for resharding (if we continue down the current coordinator path)

0 comments on commit 64b2c23

Please sign in to comment.