Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update Team ClickHouse Q3 Goals #8841

Merged
merged 1 commit into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 0 additions & 23 deletions contents/handbook/engineering/clickhouse/clusters.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -291,26 +291,3 @@ Meaning that on top of the baseline speed to NVMe disk we can tier out to EBS an
Currently US is entirely on NVMe backed nodes. EU will need to be migrated to this setup as well.


## Future work

**Objectives:**
- All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another.
- We want all cluster operations to be automated and managed through some form of infra as code that is available in source control.
- Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas.
- We should be able to spin up and down replicas of any cluster with no manual intervention.
- We should be able to upgrade ClickHouse versions with no manual intervention.
- We should have tooling / runbooks for resharding (if we continue down the current coordinator path)


**Tasks:**
- Move to NVMe backed nodes for EU
- Move to NVMe backed nodes for Coordinator
- Move to NVMe backed nodes for Dev
- Infra as code for ClickHouse
- Configs in Ansible for ClickHouse
- Upgrade US to `im4gn.16xlarge` instances in one AZ
- Setup Coordinators for multi-tennant workloads
- Move to Table defined TTL based tiered storage
- Move to ClickHouseKeeper from ZooKeeper


2 changes: 1 addition & 1 deletion contents/teams/clickhouse/mission.mdx
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Ensure ClickHouse runs real good.
Build a safe, fast, and sustainable place for our user's data.
51 changes: 35 additions & 16 deletions contents/teams/clickhouse/objectives.mdx
Original file line number Diff line number Diff line change
@@ -1,17 +1,36 @@
James as a Service -> Clickhouse as a Service
**Decide on the future of where we store our data and work towards that**

* **Key Results:**
* Better visibility
* Regularly testing backups
* Monitoring/alerting
* Mutations
* Moves
* Management
* Managing/killing mutations
* Self serve
* Schema design feedback (Jams non blocking)
* Schema management
* Automation
* Replace/Upgrade replicas
* Upgrading to 24.04
* Disk configs
ByConity is an interesting alternative to CH that incorporates a few features that we require:
- Rules and cost based query optimizer
- Separation of Storage and Compute
- Virtual Warehouses

A major goal for us this quarter is to evaluate the feasibility of this migration.


**List of all goals**

- Support P0 tasks such as
- Deletes
- Keeping clusters happy
- provisioning more disks
- Schema Reviews
- Debugging
- Performance
- Backups/Restores
- Decide whether ByConity is the way forward
- Load it with data, set up
- Test performance, test the functionality/compatibility gaps
- Branch in goals depending on ByConity feasibility
- If ByConity works, migrate over to it
- Enumerate all functionality that doesn’t work and update the functions/contribute to ByConity
- Syntax
- If it works on metal, put it in k8s with Karpenter
- Evaluate which nodes we should use
- If ByConity doesn’t work, reshard US to look like EU cluster
- All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another.
- We want all cluster operations to be automated and managed through some form of infra as code that is available in source control.
- Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas.
- We should be able to spin up and down replicas of any cluster with no manual intervention.
- We should be able to upgrade ClickHouse versions with no manual intervention.
- We should have tooling / runbooks for resharding (if we continue down the current coordinator path)
Loading