From c066ce9967cda29a5c7580a95cc0551e34775370 Mon Sep 17 00:00:00 2001 From: James Greenhill Date: Thu, 27 Jun 2024 08:18:20 -0700 Subject: [PATCH] feat: Update Team ClickHouse Q3 Goals --- .../engineering/clickhouse/clusters.mdx | 23 --------- contents/teams/clickhouse/mission.mdx | 2 +- contents/teams/clickhouse/objectives.mdx | 51 +++++++++++++------ 3 files changed, 36 insertions(+), 40 deletions(-) diff --git a/contents/handbook/engineering/clickhouse/clusters.mdx b/contents/handbook/engineering/clickhouse/clusters.mdx index 811764dde00e..8cdb28e9e532 100644 --- a/contents/handbook/engineering/clickhouse/clusters.mdx +++ b/contents/handbook/engineering/clickhouse/clusters.mdx @@ -291,26 +291,3 @@ Meaning that on top of the baseline speed to NVMe disk we can tier out to EBS an Currently US is entirely on NVMe backed nodes. EU will need to be migrated to this setup as well. -## Future work - -**Objectives:** -- All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another. -- We want all cluster operations to be automated and managed through some form of infra as code that is available in source control. -- Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas. -- We should be able to spin up and down replicas of any cluster with no manual intervention. -- We should be able to upgrade ClickHouse versions with no manual intervention. -- We should have tooling / runbooks for resharding (if we continue down the current coordinator path) - - -**Tasks:** -- Move to NVMe backed nodes for EU -- Move to NVMe backed nodes for Coordinator -- Move to NVMe backed nodes for Dev -- Infra as code for ClickHouse -- Configs in Ansible for ClickHouse -- Upgrade US to `im4gn.16xlarge` instances in one AZ -- Setup Coordinators for multi-tennant workloads -- Move to Table defined TTL based tiered storage -- Move to ClickHouseKeeper from ZooKeeper - - diff --git a/contents/teams/clickhouse/mission.mdx b/contents/teams/clickhouse/mission.mdx index 79b16e750a76..0d32f56624c2 100644 --- a/contents/teams/clickhouse/mission.mdx +++ b/contents/teams/clickhouse/mission.mdx @@ -1 +1 @@ -Ensure ClickHouse runs real good. +Build a safe, fast, and sustainable place for our user's data. \ No newline at end of file diff --git a/contents/teams/clickhouse/objectives.mdx b/contents/teams/clickhouse/objectives.mdx index a5fb1d83e556..3d9a7ca66081 100644 --- a/contents/teams/clickhouse/objectives.mdx +++ b/contents/teams/clickhouse/objectives.mdx @@ -1,17 +1,36 @@ -James as a Service -> Clickhouse as a Service +**Decide on the future of where we store our data and work towards that** -* **Key Results:** - * Better visibility - * Regularly testing backups - * Monitoring/alerting - * Mutations - * Moves - * Management - * Managing/killing mutations - * Self serve - * Schema design feedback (Jams non blocking) - * Schema management - * Automation - * Replace/Upgrade replicas - * Upgrading to 24.04 - * Disk configs +ByConity is an interesting alternative to CH that incorporates a few features that we require: +- Rules and cost based query optimizer +- Separation of Storage and Compute +- Virtual Warehouses + +A major goal for us this quarter is to evaluate the feasibility of this migration. + + +**List of all goals** + +- Support P0 tasks such as + - Deletes + - Keeping clusters happy + - provisioning more disks + - Schema Reviews + - Debugging + - Performance + - Backups/Restores +- Decide whether ByConity is the way forward + - Load it with data, set up + - Test performance, test the functionality/compatibility gaps +- Branch in goals depending on ByConity feasibility + - If ByConity works, migrate over to it + - Enumerate all functionality that doesn’t work and update the functions/contribute to ByConity + - Syntax + - If it works on metal, put it in k8s with Karpenter + - Evaluate which nodes we should use + - If ByConity doesn’t work, reshard US to look like EU cluster + - All clusters (Dev, US, EU) should be consistent in shape and topology. This will make it easier to manage and maintain the clusters and apply learnings from one cluster to another. + - We want all cluster operations to be automated and managed through some form of infra as code that is available in source control. + - Schema management on ClickHouse should be entirely automated and managed through source control with no exceptions. This includes Coordinator schemas. + - We should be able to spin up and down replicas of any cluster with no manual intervention. + - We should be able to upgrade ClickHouse versions with no manual intervention. + - We should have tooling / runbooks for resharding (if we continue down the current coordinator path)