Skip to content

Commit

Permalink
analytics folder move (#1609)
Browse files Browse the repository at this point in the history
  • Loading branch information
Phil-Mwago authored Oct 11, 2024
1 parent 45d1fb3 commit 5a984b2
Show file tree
Hide file tree
Showing 16 changed files with 17 additions and 17 deletions.
2 changes: 1 addition & 1 deletion content/en/building/guides/database/couch2pg-oom-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ aliases:
---

{{% pageinfo %}}
Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "building/guides/data/analytics" >}}).
Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "hosting/analytics" >}}).
{{% /pageinfo %}}

Some times when couch2pg is replicating documents to postgres, it encounters very large info docs that are larger than the memory allocation of the document sync array and causes out-of-memory errors.
Expand Down
2 changes: 1 addition & 1 deletion content/en/building/tutorials/couch2pg-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ aliases:
---

{{% pageinfo %}}
Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "building/guides/data/analytics" >}}).
Couch2pg is deprecated. For data synchronization, refer to [CHT Sync]({{< ref "hosting/analytics" >}}).
{{% /pageinfo %}}

This tutorial will take you through setting up a Couch2pg service.
Expand Down
4 changes: 2 additions & 2 deletions content/en/core/overview/cht-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ aliases:
relatedContent: >
core/overview/architecture
core/overview/data-flows-for-analytics/
building/guides/data/analytics/
hosting/analytics/
---

## Overview
CHT Sync is an integrated solution designed to enable data synchronization between CouchDB and PostgreSQL for the purpose of analytics. It combines several technologies to achieve this synchronization and provides an efficient workflow for data processing and visualization. The synchronization occurs in near real-time, ensuring that the data displayed on dashboards is up-to-date.

Read more about setting up [CHT Sync]({{< relref "building/guides/data/analytics/setup" >}}).
Read more about setting up [CHT Sync]({{< relref "hosting/analytics/setup" >}}).

<!-- make updates to this diagram on the google slides: -->
<!-- https://docs.google.com/presentation/d/1j4jPsi-gHbiaLBfgYOyru1g_YV98PkBrx2zs7bwhoEQ/ -->
Expand Down
2 changes: 1 addition & 1 deletion content/en/core/overview/data-flows-for-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Ultimately all the data ends up in a CouchDB instance deployed in the cloud whet

#### 2. Data Transformation

[CHT Sync]({{< relref "core/overview/cht-sync" >}}) is used to move data from CouchDB to a relational database, PostgreSQL in this case. The choice of PostgreSQL for analytics dashboard data sources is to allow use of the more familiar SQL querying. It is an open source tool that can be [easily deployed]({{< ref "building/guides/data/analytics" >}}). When deployed the service uses [CouchDB's changes feed](https://docs.couchdb.org/en/stable/api/database/changes.html) which allows capturing of everything happening in CouchDB in incremental updates. It is run and monitored by the operating system where it is configured to fetch data at a configurable interval.
[CHT Sync]({{< relref "core/overview/cht-sync" >}}) is used to move data from CouchDB to a relational database, PostgreSQL in this case. The choice of PostgreSQL for analytics dashboard data sources is to allow use of the more familiar SQL querying. It is an open source tool that can be [easily deployed]({{< ref "hosting/analytics" >}}). When deployed the service uses [CouchDB's changes feed](https://docs.couchdb.org/en/stable/api/database/changes.html) which allows capturing of everything happening in CouchDB in incremental updates. It is run and monitored by the operating system where it is configured to fetch data at a configurable interval.

Data copied over to PostgreSQL is first stored as raw json (document) making use of PostgreSQL's jsonb data type to create an exact replica of a CouchDB database. From this, default views are created at deployment of the service and refreshed during every subsequent run. Additional custom materialized views created later are also refreshed at this time.

Expand Down
2 changes: 1 addition & 1 deletion content/en/hosting/3.x/ec2-setup-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ This guide will walk you through the process of creating an EC2 instance, mounti
- See [SSL Certificates]({{< relref "hosting/3.x/ssl-cert-install">}}) to install new certificates
1. Configure CHT Sync
See the [CHT Sync configuration]({{< relref "building/guides/data/analytics">}}).
See the [CHT Sync configuration]({{< relref "hosting/analytics">}}).
1. Setup postgres to work with CHT Sync
- Creating the database, setting up permissions, exploring the tables and what they store
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ packages:
```
To avoid breaking changes in downstream models, include `revision` in the dependency, which should be a version tag for `cht-pipeline`.

In CHT Sync config, set the URL of dbt GitHub repository to the `CHT_PIPELINE_BRANCH_URL` [environment variable]({{< relref "building/guides/data/analytics/environment-variables" >}}), either in `.env` if using `docker compose`, or in `values.yaml` if using Kubernetes.
In CHT Sync config, set the URL of dbt GitHub repository to the `CHT_PIPELINE_BRANCH_URL` [environment variable]({{< relref "hosting/analytics/environment-variables" >}}), either in `.env` if using `docker compose`, or in `values.yaml` if using Kubernetes.

### Deploying models

Expand All @@ -48,7 +48,7 @@ When it is necessary to update the base models, update the version tag in the de

### Testing models and dashboards

It is highly encouraged to write [dbt tests]({{< ref "building/guides/data/analytics/testing-dbt-models" >}}) for application-specific models to ensure that they are accurate and to avoid releasing broken models. Examples can be found in the [cht-pipeline repository](https://github.com/medic/cht-pipeline/tree/main/tests).
It is highly encouraged to write [dbt tests]({{< ref "hosting/analytics/testing-dbt-models" >}}) for application-specific models to ensure that they are accurate and to avoid releasing broken models. Examples can be found in the [cht-pipeline repository](https://github.com/medic/cht-pipeline/tree/main/tests).


## Base Models
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,21 @@ aliases:
- /apps/guides/data/analytics/couch2pg-to-cht-sync-migration
---

This page outlines guidelines for migrating from [couch2pg](https://github.com/medic/cht-couch2pg) to the data pipeline based on CHT Sync. One of the main changes in this flow is separating the syncing process from the data transformation, with dbt now handling the latter in [cht-pipeline](https://github.com/medic/cht-pipeline/). This migration requires dbt models in the cht-pipeline repository instead of SQL views and tables. One thing to note is that the schema for CHT Sync differs from couch2pg, so dbt models will not directly replace the SQL views and tables. For instructions on how to get started with dbt models, refer to the [dbt models guide]({{< relref "building/guides/data/analytics/testing-dbt-models" >}}).
This page outlines guidelines for migrating from [couch2pg](https://github.com/medic/cht-couch2pg) to the data pipeline based on CHT Sync. One of the main changes in this flow is separating the syncing process from the data transformation, with dbt now handling the latter in [cht-pipeline](https://github.com/medic/cht-pipeline/). This migration requires dbt models in the cht-pipeline repository instead of SQL views and tables. One thing to note is that the schema for CHT Sync differs from couch2pg, so dbt models will not directly replace the SQL views and tables. For instructions on how to get started with dbt models, refer to the [dbt models guide]({{< relref "hosting/analytics/testing-dbt-models" >}}).

## Key Considerations
- **Server resources**: To minimize downtime, running both couch2pg and CHT Sync in parallel during the migration process is recommended. With this in mind, ensure that the server and database resources are sufficient to handle the load.
- **dbt Modelling**: Avoid the temptation to model new dbt models after existing SQL views and tables. Instead, take the opportunity to re-evaluate the data needs and design new models that are more efficient and effective. Think of what data needs to be shown and how it should be shown in data visualization tools and use that to guide the design of the new models.
- **Testing**: After migrating, thoroughly test the new dbt models to ensure that they work as expected. Refer to the [testing dbt models guide]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) for more information on testing.
- **Testing**: After migrating, thoroughly test the new dbt models to ensure that they work as expected. Refer to the [testing dbt models guide]({{< relref "hosting/analytics/building-dbt-models" >}}) for more information on testing.
- **Feedback**: Provide any feedback and create issues for errors or bugs encountered in the [cht-sync](https://github.com/medic/cht-sync) and [cht-pipeline](https://github.com/medic/cht-pipeline/) repositories to improve the tools.

## Migration Steps
1. **Plan the migration**: Determine the scope of the migration, including the data sources, the data models, and the data transformations. Identify the existing SQL views, tables, and dashboards and assess what data you want to visualize.
1. **Set up CHT Sync**: Follow the instructions [to setup CHT Sync locally]({{< relref "building/guides/data/analytics/setup" >}}).
1. **Build dbt models**: Use the [dedicated guidelines]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) to build dbt models for the data you want to visualize.
1. **Deploy CHT Sync**: Once the dbt models are tested locally and working as expected, deploy CHT Sync in production. Follow [the instructions to set up CHT Sync in production]({{< relref "building/guides/data/analytics/production" >}}). It is recommended that CHT Sync be run in parallel with couch2pg during the migration process. This minimises disruption to users of the existing dashboards because they can continue to use the existing data while the new pipeline is being set up. It also makes it easier to compare the data from couch2pg when testing the new pipeline.
1. **Set up CHT Sync**: Follow the instructions [to setup CHT Sync locally]({{< relref "hosting/analytics/setup" >}}).
1. **Build dbt models**: Use the [dedicated guidelines]({{< relref "hosting/analytics/building-dbt-models" >}}) to build dbt models for the data you want to visualize.
1. **Deploy CHT Sync**: Once the dbt models are tested locally and working as expected, deploy CHT Sync in production. Follow [the instructions to set up CHT Sync in production]({{< relref "hosting/analytics/production" >}}). It is recommended that CHT Sync be run in parallel with couch2pg during the migration process. This minimises disruption to users of the existing dashboards because they can continue to use the existing data while the new pipeline is being set up. It also makes it easier to compare the data from couch2pg when testing the new pipeline.
1. **Create replica dashboards**: In the data visualization tool of your choice, create replica dashboards of your current setup and compare the data from the old and new pipelines.
1. **Test and adjust the dbt models**: Test the dbt models to ensure they are working as expected and that the replica and initial dashboards match. Adjust the models to ensure they are accurate.
1. **Optimize**: Once the dbt models are working as expected and the dashboards display the expected data, optimize the models to improve performance. This may involve restructuring the models, adding indexes, or making other adjustments to improve the speed and efficiency of the models. Having a look at the [dbt models guide]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) will help you understand how to optimize the models.
1. **Optimize**: Once the dbt models are working as expected and the dashboards display the expected data, optimize the models to improve performance. This may involve restructuring the models, adding indexes, or making other adjustments to improve the speed and efficiency of the models. Having a look at the [dbt models guide]({{< relref "hosting/analytics/building-dbt-models" >}}) will help you understand how to optimize the models.
1. **Set up monitoring and alerting**: [Setup CHT Watchdog]({{< relref "hosting/monitoring/setup" >}}) to monitor the running of CHT Sync and set up alerts for any failures.
1. **Remove couch2pg and the duplicate database**: Once the new pipeline runs as expected, you can remove couch2pg and the duplicate database. Ensure that all data is being synced correctly, that the dbt models are working as expected, and that the dashboards display the expected data before switching them off and removing couch2pg.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ We recommend running [CHT Sync](https://github.com/medic/cht-sync) in production
- Helm: The Kubernetes package manager. You can install it using the [helm installation guide](https://helm.sh/docs/intro/install/).

## Database disk space requirements
The disk space required for the database depends on a few things including the size the of CouchDB databases being replicated, and the [models]({{< relref "building/guides/data/analytics/building-dbt-models" >}}) defined. The database will grow over time as more data is added to CouchDB. The database should be monitored to ensure that it has enough space to accommodate the data. To get an idea of the size requirements of the database, you can replicate 10% of the data from CouchDB to Postgres and then run the following command to see disk usage:
The disk space required for the database depends on a few things including the size the of CouchDB databases being replicated, and the [models]({{< relref "hosting/analytics/building-dbt-models" >}}) defined. The database will grow over time as more data is added to CouchDB. The database should be monitored to ensure that it has enough space to accommodate the data. To get an idea of the size requirements of the database, you can replicate 10% of the data from CouchDB to Postgres and then run the following command to see disk usage:
```shell
SELECT pg_size_pretty(pg_database_size('your_database_name'));
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ These instructions assume you're running CHT Sync, CHT Core and PostgreSQL eithe

## Setup

Copy the values in `env.template` file to the `.env` file. For more information, see the references on the [Environment variables page]({{< relref "building/guides/data/analytics/environment-variables" >}}).
Copy the values in `env.template` file to the `.env` file. For more information, see the references on the [Environment variables page]({{< relref "hosting/analytics/environment-variables" >}}).

{{% alert title="Note" %}}
The first time you run the commands from any of the sections below it will need to download many Docker images and will take a while. You'll know it's done when you see `#8 DONE 0.0s` and you are returned to the command line. Be patient!
Expand Down
2 changes: 1 addition & 1 deletion content/en/hosting/monitoring/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ docker compose up -d

#### CHT Sync Data (Local)

With the [release of 1.1.0](https://github.com/medic/cht-watchdog/releases/tag/1.1.0), Watchdog now supports easily ingesting [CHT Sync]({{< relref "building/guides/data/analytics/introduction" >}}) data read in from a Postgres database (supports Postgres `>= 9.x`).
With the [release of 1.1.0](https://github.com/medic/cht-watchdog/releases/tag/1.1.0), Watchdog now supports easily ingesting [CHT Sync]({{< relref "hosting/analytics/introduction" >}}) data read in from a Postgres database (supports Postgres `>= 9.x`).

1. Copy the example config file, so you can add the correct contents in them:
```shell
Expand Down

0 comments on commit 5a984b2

Please sign in to comment.