From 5a984b218fddff8e521334f3eec3e4d6132ada1b Mon Sep 17 00:00:00 2001 From: Phil Mwago <41321750+Phil-Mwago@users.noreply.github.com> Date: Fri, 11 Oct 2024 10:35:51 +0300 Subject: [PATCH] analytics folder move (#1609) --- .../building/guides/database/couch2pg-oom-errors.md | 2 +- content/en/building/tutorials/couch2pg-setup.md | 2 +- content/en/core/overview/cht-sync.md | 4 ++-- .../en/core/overview/data-flows-for-analytics.md | 2 +- content/en/hosting/3.x/ec2-setup-guide.md | 2 +- .../guides/data => hosting}/analytics/_index.html | 0 .../analytics/building-dbt-models.md | 4 ++-- .../building-dbt-models/cht-pipeline-er.dot | 0 .../building-dbt-models/cht-pipeline-er.png | Bin .../analytics/couch2pg-to-cht-sync-migration.md | 12 ++++++------ .../analytics/environment-variables.md | 0 .../data => hosting}/analytics/introduction.md | 0 .../guides/data => hosting}/analytics/production.md | 2 +- .../guides/data => hosting}/analytics/setup.md | 2 +- .../analytics/testing-dbt-models.md | 0 content/en/hosting/monitoring/setup.md | 2 +- 16 files changed, 17 insertions(+), 17 deletions(-) rename content/en/{building/guides/data => hosting}/analytics/_index.html (100%) rename content/en/{building/guides/data => hosting}/analytics/building-dbt-models.md (97%) rename content/en/{building/guides/data => hosting}/analytics/building-dbt-models/cht-pipeline-er.dot (100%) rename content/en/{building/guides/data => hosting}/analytics/building-dbt-models/cht-pipeline-er.png (100%) rename content/en/{building/guides/data => hosting}/analytics/couch2pg-to-cht-sync-migration.md (79%) rename content/en/{building/guides/data => hosting}/analytics/environment-variables.md (100%) rename content/en/{building/guides/data => hosting}/analytics/introduction.md (100%) rename content/en/{building/guides/data => hosting}/analytics/production.md (92%) rename content/en/{building/guides/data => hosting}/analytics/setup.md (97%) rename content/en/{building/guides/data => hosting}/analytics/testing-dbt-models.md (100%) diff --git a/content/en/building/guides/database/couch2pg-oom-errors.md b/content/en/building/guides/database/couch2pg-oom-errors.md index 512e4f007..f30ef5a59 100644 --- a/content/en/building/guides/database/couch2pg-oom-errors.md +++ b/content/en/building/guides/database/couch2pg-oom-errors.md @@ -11,7 +11,7 @@ aliases: --- {{% pageinfo %}} -Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "building/guides/data/analytics" >}}). +Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "hosting/analytics" >}}). {{% /pageinfo %}} Some times when couch2pg is replicating documents to postgres, it encounters very large info docs that are larger than the memory allocation of the document sync array and causes out-of-memory errors. diff --git a/content/en/building/tutorials/couch2pg-setup.md b/content/en/building/tutorials/couch2pg-setup.md index 34dce6da0..faff722fe 100644 --- a/content/en/building/tutorials/couch2pg-setup.md +++ b/content/en/building/tutorials/couch2pg-setup.md @@ -11,7 +11,7 @@ aliases: --- {{% pageinfo %}} -Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "building/guides/data/analytics" >}}). +Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "hosting/analytics" >}}). {{% /pageinfo %}} This tutorial will take you through setting up a Couch2pg service. diff --git a/content/en/core/overview/cht-sync.md b/content/en/core/overview/cht-sync.md index db90a9802..a176ab60d 100644 --- a/content/en/core/overview/cht-sync.md +++ b/content/en/core/overview/cht-sync.md @@ -9,13 +9,13 @@ aliases: relatedContent: > core/overview/architecture core/overview/data-flows-for-analytics/ - building/guides/data/analytics/ + hosting/analytics/ --- ## Overview CHT Sync is an integrated solution designed to enable data synchronization between CouchDB and PostgreSQL for the purpose of analytics. It combines several technologies to achieve this synchronization and provides an efficient workflow for data processing and visualization. The synchronization occurs in near real-time, ensuring that the data displayed on dashboards is up-to-date. -Read more about setting up [CHT Sync]({{< relref "building/guides/data/analytics/setup" >}}). +Read more about setting up [CHT Sync]({{< relref "hosting/analytics/setup" >}}). diff --git a/content/en/core/overview/data-flows-for-analytics.md b/content/en/core/overview/data-flows-for-analytics.md index 0a28665a5..e55ec8251 100644 --- a/content/en/core/overview/data-flows-for-analytics.md +++ b/content/en/core/overview/data-flows-for-analytics.md @@ -46,7 +46,7 @@ Ultimately all the data ends up in a CouchDB instance deployed in the cloud whet #### 2. Data Transformation -[CHT Sync]({{< relref "core/overview/cht-sync" >}}) is used to move data from CouchDB to a relational database, PostgreSQL in this case. The choice of PostgreSQL for analytics dashboard data sources is to allow use of the more familiar SQL querying. It is an open source tool that can be [easily deployed]({{< ref "building/guides/data/analytics" >}}). When deployed the service uses [CouchDB's changes feed](https://docs.couchdb.org/en/stable/api/database/changes.html) which allows capturing of everything happening in CouchDB in incremental updates. It is run and monitored by the operating system where it is configured to fetch data at a configurable interval. +[CHT Sync]({{< relref "core/overview/cht-sync" >}}) is used to move data from CouchDB to a relational database, PostgreSQL in this case. The choice of PostgreSQL for analytics dashboard data sources is to allow use of the more familiar SQL querying. It is an open source tool that can be [easily deployed]({{< ref "hosting/analytics" >}}). When deployed the service uses [CouchDB's changes feed](https://docs.couchdb.org/en/stable/api/database/changes.html) which allows capturing of everything happening in CouchDB in incremental updates. It is run and monitored by the operating system where it is configured to fetch data at a configurable interval. Data copied over to PostgreSQL is first stored as raw json (document) making use of PostgreSQL's jsonb data type to create an exact replica of a CouchDB database. From this, default views are created at deployment of the service and refreshed during every subsequent run. Additional custom materialized views created later are also refreshed at this time. diff --git a/content/en/hosting/3.x/ec2-setup-guide.md b/content/en/hosting/3.x/ec2-setup-guide.md index d4ecd0bc8..3ee1f3e45 100644 --- a/content/en/hosting/3.x/ec2-setup-guide.md +++ b/content/en/hosting/3.x/ec2-setup-guide.md @@ -61,7 +61,7 @@ This guide will walk you through the process of creating an EC2 instance, mounti - See [SSL Certificates]({{< relref "hosting/3.x/ssl-cert-install">}}) to install new certificates 1. Configure CHT Sync - See the [CHT Sync configuration]({{< relref "building/guides/data/analytics">}}). + See the [CHT Sync configuration]({{< relref "hosting/analytics">}}). 1. Setup postgres to work with CHT Sync - Creating the database, setting up permissions, exploring the tables and what they store diff --git a/content/en/building/guides/data/analytics/_index.html b/content/en/hosting/analytics/_index.html similarity index 100% rename from content/en/building/guides/data/analytics/_index.html rename to content/en/hosting/analytics/_index.html diff --git a/content/en/building/guides/data/analytics/building-dbt-models.md b/content/en/hosting/analytics/building-dbt-models.md similarity index 97% rename from content/en/building/guides/data/analytics/building-dbt-models.md rename to content/en/hosting/analytics/building-dbt-models.md index 8313f6c14..a5c31405a 100644 --- a/content/en/building/guides/data/analytics/building-dbt-models.md +++ b/content/en/hosting/analytics/building-dbt-models.md @@ -30,7 +30,7 @@ packages: ``` To avoid breaking changes in downstream models, include `revision` in the dependency, which should be a version tag for `cht-pipeline`. -In CHT Sync config, set the URL of dbt GitHub repository to the `CHT_PIPELINE_BRANCH_URL` [environment variable]({{< relref "building/guides/data/analytics/environment-variables" >}}), either in `.env` if using `docker compose`, or in `values.yaml` if using Kubernetes. +In CHT Sync config, set the URL of dbt GitHub repository to the `CHT_PIPELINE_BRANCH_URL` [environment variable]({{< relref "hosting/analytics/environment-variables" >}}), either in `.env` if using `docker compose`, or in `values.yaml` if using Kubernetes. ### Deploying models @@ -48,7 +48,7 @@ When it is necessary to update the base models, update the version tag in the de ### Testing models and dashboards -It is highly encouraged to write [dbt tests]({{< ref "building/guides/data/analytics/testing-dbt-models" >}}) for application-specific models to ensure that they are accurate and to avoid releasing broken models. Examples can be found in the [cht-pipeline repository](https://github.com/medic/cht-pipeline/tree/main/tests). +It is highly encouraged to write [dbt tests]({{< ref "hosting/analytics/testing-dbt-models" >}}) for application-specific models to ensure that they are accurate and to avoid releasing broken models. Examples can be found in the [cht-pipeline repository](https://github.com/medic/cht-pipeline/tree/main/tests). ## Base Models diff --git a/content/en/building/guides/data/analytics/building-dbt-models/cht-pipeline-er.dot b/content/en/hosting/analytics/building-dbt-models/cht-pipeline-er.dot similarity index 100% rename from content/en/building/guides/data/analytics/building-dbt-models/cht-pipeline-er.dot rename to content/en/hosting/analytics/building-dbt-models/cht-pipeline-er.dot diff --git a/content/en/building/guides/data/analytics/building-dbt-models/cht-pipeline-er.png b/content/en/hosting/analytics/building-dbt-models/cht-pipeline-er.png similarity index 100% rename from content/en/building/guides/data/analytics/building-dbt-models/cht-pipeline-er.png rename to content/en/hosting/analytics/building-dbt-models/cht-pipeline-er.png diff --git a/content/en/building/guides/data/analytics/couch2pg-to-cht-sync-migration.md b/content/en/hosting/analytics/couch2pg-to-cht-sync-migration.md similarity index 79% rename from content/en/building/guides/data/analytics/couch2pg-to-cht-sync-migration.md rename to content/en/hosting/analytics/couch2pg-to-cht-sync-migration.md index 4c698cac0..6438c8def 100644 --- a/content/en/building/guides/data/analytics/couch2pg-to-cht-sync-migration.md +++ b/content/en/hosting/analytics/couch2pg-to-cht-sync-migration.md @@ -11,21 +11,21 @@ aliases: - /apps/guides/data/analytics/couch2pg-to-cht-sync-migration --- -This page outlines guidelines for migrating from [couch2pg](https://github.com/medic/cht-couch2pg) to the data pipeline based on CHT Sync. One of the main changes in this flow is separating the syncing process from the data transformation, with dbt now handling the latter in [cht-pipeline](https://github.com/medic/cht-pipeline/). This migration requires dbt models in the cht-pipeline repository instead of SQL views and tables. One thing to note is that the schema for CHT Sync differs from couch2pg, so dbt models will not directly replace the SQL views and tables. For instructions on how to get started with dbt models, refer to the [dbt models guide]({{< relref "building/guides/data/analytics/testing-dbt-models" >}}). +This page outlines guidelines for migrating from [couch2pg](https://github.com/medic/cht-couch2pg) to the data pipeline based on CHT Sync. One of the main changes in this flow is separating the syncing process from the data transformation, with dbt now handling the latter in [cht-pipeline](https://github.com/medic/cht-pipeline/). This migration requires dbt models in the cht-pipeline repository instead of SQL views and tables. One thing to note is that the schema for CHT Sync differs from couch2pg, so dbt models will not directly replace the SQL views and tables. For instructions on how to get started with dbt models, refer to the [dbt models guide]({{< relref "hosting/analytics/testing-dbt-models" >}}). ## Key Considerations - **Server resources**: To minimize downtime, running both couch2pg and CHT Sync in parallel during the migration process is recommended. With this in mind, ensure that the server and database resources are sufficient to handle the load. - **dbt Modelling**: Avoid the temptation to model new dbt models after existing SQL views and tables. Instead, take the opportunity to re-evaluate the data needs and design new models that are more efficient and effective. Think of what data needs to be shown and how it should be shown in data visualization tools and use that to guide the design of the new models. -- **Testing**: After migrating, thoroughly test the new dbt models to ensure that they work as expected. Refer to the [testing dbt models guide]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) for more information on testing. +- **Testing**: After migrating, thoroughly test the new dbt models to ensure that they work as expected. Refer to the [testing dbt models guide]({{< relref "hosting/analytics/building-dbt-models" >}}) for more information on testing. - **Feedback**: Provide any feedback and create issues for errors or bugs encountered in the [cht-sync](https://github.com/medic/cht-sync) and [cht-pipeline](https://github.com/medic/cht-pipeline/) repositories to improve the tools. ## Migration Steps 1. **Plan the migration**: Determine the scope of the migration, including the data sources, the data models, and the data transformations. Identify the existing SQL views, tables, and dashboards and assess what data you want to visualize. -1. **Set up CHT Sync**: Follow the instructions [to setup CHT Sync locally]({{< relref "building/guides/data/analytics/setup" >}}). -1. **Build dbt models**: Use the [dedicated guidelines]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) to build dbt models for the data you want to visualize. -1. **Deploy CHT Sync**: Once the dbt models are tested locally and working as expected, deploy CHT Sync in production. Follow [the instructions to set up CHT Sync in production]({{< relref "building/guides/data/analytics/production" >}}). It is recommended that CHT Sync be run in parallel with couch2pg during the migration process. This minimises disruption to users of the existing dashboards because they can continue to use the existing data while the new pipeline is being set up. It also makes it easier to compare the data from couch2pg when testing the new pipeline. +1. **Set up CHT Sync**: Follow the instructions [to setup CHT Sync locally]({{< relref "hosting/analytics/setup" >}}). +1. **Build dbt models**: Use the [dedicated guidelines]({{< relref "hosting/analytics/building-dbt-models" >}}) to build dbt models for the data you want to visualize. +1. **Deploy CHT Sync**: Once the dbt models are tested locally and working as expected, deploy CHT Sync in production. Follow [the instructions to set up CHT Sync in production]({{< relref "hosting/analytics/production" >}}). It is recommended that CHT Sync be run in parallel with couch2pg during the migration process. This minimises disruption to users of the existing dashboards because they can continue to use the existing data while the new pipeline is being set up. It also makes it easier to compare the data from couch2pg when testing the new pipeline. 1. **Create replica dashboards**: In the data visualization tool of your choice, create replica dashboards of your current setup and compare the data from the old and new pipelines. 1. **Test and adjust the dbt models**: Test the dbt models to ensure they are working as expected and that the replica and initial dashboards match. Adjust the models to ensure they are accurate. -1. **Optimize**: Once the dbt models are working as expected and the dashboards display the expected data, optimize the models to improve performance. This may involve restructuring the models, adding indexes, or making other adjustments to improve the speed and efficiency of the models. Having a look at the [dbt models guide]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) will help you understand how to optimize the models. +1. **Optimize**: Once the dbt models are working as expected and the dashboards display the expected data, optimize the models to improve performance. This may involve restructuring the models, adding indexes, or making other adjustments to improve the speed and efficiency of the models. Having a look at the [dbt models guide]({{< relref "hosting/analytics/building-dbt-models" >}}) will help you understand how to optimize the models. 1. **Set up monitoring and alerting**: [Setup CHT Watchdog]({{< relref "hosting/monitoring/setup" >}}) to monitor the running of CHT Sync and set up alerts for any failures. 1. **Remove couch2pg and the duplicate database**: Once the new pipeline runs as expected, you can remove couch2pg and the duplicate database. Ensure that all data is being synced correctly, that the dbt models are working as expected, and that the dashboards display the expected data before switching them off and removing couch2pg. diff --git a/content/en/building/guides/data/analytics/environment-variables.md b/content/en/hosting/analytics/environment-variables.md similarity index 100% rename from content/en/building/guides/data/analytics/environment-variables.md rename to content/en/hosting/analytics/environment-variables.md diff --git a/content/en/building/guides/data/analytics/introduction.md b/content/en/hosting/analytics/introduction.md similarity index 100% rename from content/en/building/guides/data/analytics/introduction.md rename to content/en/hosting/analytics/introduction.md diff --git a/content/en/building/guides/data/analytics/production.md b/content/en/hosting/analytics/production.md similarity index 92% rename from content/en/building/guides/data/analytics/production.md rename to content/en/hosting/analytics/production.md index 26e565e6c..c4b730cdf 100644 --- a/content/en/building/guides/data/analytics/production.md +++ b/content/en/hosting/analytics/production.md @@ -19,7 +19,7 @@ We recommend running [CHT Sync](https://github.com/medic/cht-sync) in production - Helm: The Kubernetes package manager. You can install it using the [helm installation guide](https://helm.sh/docs/intro/install/). ## Database disk space requirements -The disk space required for the database depends on a few things including the size the of CouchDB databases being replicated, and the [models]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) defined. The database will grow over time as more data is added to CouchDB. The database should be monitored to ensure that it has enough space to accommodate the data. To get an idea of the size requirements of the database, you can replicate 10% of the data from CouchDB to Postgres and then run the following command to see disk usage: +The disk space required for the database depends on a few things including the size the of CouchDB databases being replicated, and the [models]({{< relref "hosting/analytics/building-dbt-models" >}}) defined. The database will grow over time as more data is added to CouchDB. The database should be monitored to ensure that it has enough space to accommodate the data. To get an idea of the size requirements of the database, you can replicate 10% of the data from CouchDB to Postgres and then run the following command to see disk usage: ```shell SELECT pg_size_pretty(pg_database_size('your_database_name')); ``` diff --git a/content/en/building/guides/data/analytics/setup.md b/content/en/hosting/analytics/setup.md similarity index 97% rename from content/en/building/guides/data/analytics/setup.md rename to content/en/hosting/analytics/setup.md index 0ff63ad08..bbf9bd45f 100644 --- a/content/en/building/guides/data/analytics/setup.md +++ b/content/en/hosting/analytics/setup.md @@ -17,7 +17,7 @@ These instructions assume you're running CHT Sync, CHT Core and PostgreSQL eithe ## Setup -Copy the values in `env.template` file to the `.env` file. For more information, see the references on the [Environment variables page]({{< relref "building/guides/data/analytics/environment-variables" >}}). +Copy the values in `env.template` file to the `.env` file. For more information, see the references on the [Environment variables page]({{< relref "hosting/analytics/environment-variables" >}}). {{% alert title="Note" %}} The first time you run the commands from any of the sections below it will need to download many Docker images and will take a while. You'll know it's done when you see `#8 DONE 0.0s` and you are returned to the command line. Be patient! diff --git a/content/en/building/guides/data/analytics/testing-dbt-models.md b/content/en/hosting/analytics/testing-dbt-models.md similarity index 100% rename from content/en/building/guides/data/analytics/testing-dbt-models.md rename to content/en/hosting/analytics/testing-dbt-models.md diff --git a/content/en/hosting/monitoring/setup.md b/content/en/hosting/monitoring/setup.md index 8dfa80ac5..72be3c64f 100644 --- a/content/en/hosting/monitoring/setup.md +++ b/content/en/hosting/monitoring/setup.md @@ -119,7 +119,7 @@ docker compose up -d #### CHT Sync Data (Local) -With the [release of 1.1.0](https://github.com/medic/cht-watchdog/releases/tag/1.1.0), Watchdog now supports easily ingesting [CHT Sync]({{< relref "building/guides/data/analytics/introduction" >}}) data read in from a Postgres database (supports Postgres `>= 9.x`). +With the [release of 1.1.0](https://github.com/medic/cht-watchdog/releases/tag/1.1.0), Watchdog now supports easily ingesting [CHT Sync]({{< relref "hosting/analytics/introduction" >}}) data read in from a Postgres database (supports Postgres `>= 9.x`). 1. Copy the example config file, so you can add the correct contents in them: ```shell