-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run dbt models only on the relevant databases #176
Comments
This is a serious shift from the way cht-sync works now and the way couch2pg worked before, and would likely require a significant overhaul of how data is managed. I don't think this is a straight-forward technical issue, because one solution requires having multiple data source tables in Postgres, one for each medic database, or at least keep medic separate from the others. Another solution could be to store which database which document was copied from, and update the base models to exclude non-medic docs. |
Seems like the simplest solution, and I don't think it will be a significant performance issue to have more rows in the source table which are then ignored in all downstream models. |
Is it not possible on cht-sync to store data in two separate tables? As under the hood cht-sync also uses cht-couch2pg to sync data, is it not possible to insert
I am not suggesting storing data duplicately. The main concern of this issue is to run dbt models that are required to run or that are useful to run on that data set. It does not make any sense running models like contact or patient on database other than |
If anyone wrote models or queries directly against the main database, those would be broken. So it's a potentially breaking change. |
Not to mention that migrating to the new "structure" will require a re-sync and model rebuild for all projects that have already deployed cht-sync. |
We added
|
Describe the issue
It looks like currently dbt models are run on all the couchdb databases synced. While there's an issue in cht-sync syncing multiple databases (#165 ), if you specify any database other than
medic
, it would still sync. Couchdb has separated databases for different types of data storing, and running default dbt models on all databases is not helpful and this can actualy cause a performance problem.Describe the improvement you'd like
We should separate the couchdb database and models we want to run on them. For example, the dbt models you want to run on
medic
will be totally different frommedic-users-meta
ormedic-sentinel
databases.This is critical also from the perfromance point of view. There's no point in running the dbt models where it's not going to make any changes.
medic-users-meta
andmedic-sentinel
dbs could be as big asmedic
and runningmedic
's models on those databases is not helpful.Related:
medic/cht-pipeline#168 is related to this as it looks as of now we only have models defined for
medic
database.The text was updated successfully, but these errors were encountered: