Skip to content

Latest commit

 

History

History
463 lines (361 loc) · 25.4 KB

dhis2.md

File metadata and controls

463 lines (361 loc) · 25.4 KB

OpenHEXA Toolbox DHIS2

An utility library to acquire and process data from a DHIS2 instance.

pip install openhexa.toolbox

Credentials are required to initialize a connection to a DHIS2 instance, and must be provided through a Connection object.

In an OpenHEXA workspace (e.g. in an OpenHEXA pipeline or in an OpenHEXA notebook), a Connection object can be created using the OpenHEXA SDK by providing the identifier of the workspace connection.

OpenHEXA workspace connection

>>> from openhexa.sdk import workspace
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection in an OpenHEXA workspace
>>> con = workspace.dhis2_connection("DHIS2_PLAY")
>>> dhis = DHIS2(con)

Outside an OpenHEXA workspace, a connection can be manually created using the SDK by providing the instance URL, an username and a password.

>>> from openhexa.sdk.workspaces.connections import DHIS2Connection
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection outside an OpenHEXA workspace
>>> con = DHIS2Connection(url="https://play.dhis2.org/40.0.1", username="admin", password="district")
>>> dhis = DHIS2(con)

If needed, the OpenHEXA SDK dependency can be bypassed by providing a namedtuple instead of a Connection object.

>>> from collections import namedtuple
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection outside an OpenHEXA workspace
>>> Connection = namedtuple("Connection", ["url", "username", "password"])
>>> con = Connection(url="https://play.dhis2.org/40.0.1", username="admin", password="district")
>>> dhis = DHIS2(con)

Caching can be activated by providing a cache directory when initializing a new connection.

>>> from openhexa.sdk import workspace
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection in an OpenHEXA workspace
>>> con = workspace.dhis2_connection("DHIS2_PLAY")
>>> dhis = DHIS2(con, cache_dir=".cache")

As of now, the library only caches instance metadata and does not handle data queries.

Instance metadata can be accessed through a set of methods under the DHIS2.meta namespace. Metadata are always returned as JSON-like objects that can easily be converted into Pandas or Polars dataframes.

>>> import polars as pl
>>> from openhexa.sdk import workspace
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection in an OpenHEXA workspace
>>> con = workspace.dhis2_connection("DHIS2_PLAY")
>>> dhis = DHIS2(con, cache_dir=".cache")

>>> # read organisation units metadata
>>> org_units = dhis.meta.organisation_units()
>>> df = pl.DataFrame(org_units)

>>> print(df)

shape: (1_332, 5)
┌─────────────┬──────────────────────┬───────┬─────────────────────────────────┬───────────────────┐
│ idnamelevelpathgeometry          │
│ ---------------               │
│ strstri64strstr               │
╞═════════════╪══════════════════════╪═══════╪═════════════════════════════════╪═══════════════════╡
│ Rp268JB6Ne4Adonkia CHP4/ImspTQPwCqd/at6UHUQatSo/qtr8GGnull              │
│             ┆                      ┆       ┆ l…                              ┆                   │
│ cDw53Ej8rjuAfro Arab Clinic4/ImspTQPwCqd/at6UHUQatSo/qtr8GGnull              │
│             ┆                      ┆       ┆ l…                              ┆                   │
│ GvFqTavdpGEAgape CHP4/ImspTQPwCqd/O6uvpzGd5pu/U6Kr7Gnull              │
│             ┆                      ┆       ┆ t…                              ┆                   │
│ plnHVbJR6p4Ahamadyya Mission Cl4/ImspTQPwCqd/PMa2VCrupOd/QywkxF ┆ {"type": "Point", │
│             ┆                      ┆       ┆ u…                              ┆ "coordinates":…   │
│ …           ┆ …                    ┆ …     ┆ …                               ┆ …                 │
│ hDW65lFySeFYoundu CHP4/ImspTQPwCqd/jmIPBj66vD6/Z9QaI6null              │
│             ┆                      ┆       ┆ s…                              ┆                   │
│ Urk55T8KgpTYoyah CHP4/ImspTQPwCqd/jUb8gELQApl/yu4N82null              │
│             ┆                      ┆       ┆ F…                              ┆                   │
│ VdXuxcNkiadYoyema MCHP4/ImspTQPwCqd/jmIPBj66vD6/USQdmv ┆ {"type": "Point", │
│             ┆                      ┆       ┆ r…                              ┆ "coordinates":…   │
│ BNFrspDBKelZimmi CHC4/ImspTQPwCqd/bL4ooGhyHRQ/BD9gU0 ┆ {"type": "Point", │
│             ┆                      ┆       ┆ G…                              ┆ "coordinates":…   │
└─────────────┴──────────────────────┴───────┴─────────────────────────────────┴───────────────────┘

The following metadata types are supported:

  • DHIS2.meta.system_info()
  • DHIS2.meta.organisation_units()
  • DHIS2.meta.organisation_unit_groups()
  • DHIS2.meta.organisation_unit_levels()
  • DHIS2.meta.datasets()
  • DHIS2.meta.data_elements()
  • DHIS2.meta.data_element_groups()
  • DHIS2.meta.indicators()
  • DHIS2.meta.indicator_groups()
  • DHIS2.meta.category_option_combos()

Data can be accessed through two distinct endpoints: dataValueSets and analytics. The dataValueSets endpoint allows to query raw data values stored in the DHIS2 database, while analytics can access aggregated data stored in the DHIS2 analytics tables.

Raw data values can be read using the DHIS2.data_value_sets.get() method. The method accepts the following arguments:

  • data_elements : list of str, optional
    Data element identifiers (requires DHIS2 >= 2.39)

  • datasets : list of str, optional
    Dataset identifiers

  • data_element_groups : str, optional
    Data element groups identifiers

  • periods : list of str, optional
    Period identifiers in ISO format

  • start_date : str, optional
    Start date for the time span of the values to export

  • end_date : str, optional
    End date for the time span of the values to export

  • org_units : list of str, optional
    Organisation units identifiers

  • org_unit_groups : list of str, optional
    Organisation unit groups identifiers

  • children : bool, optional (default=False)
    Whether to include the children in the hierarchy of the organisation units

  • attribute_option_combos : list of str, optional
    Attribute option combos identifiers

  • last_updated : str, optional
    Include only data values which are updated since the given time stamp

  • last_updated_duration : str, optional
    Include only data values which are updated within the given duration. The format is , where the supported time units are "d" (days), "h" (hours), "m" (minutes) and "s" (seconds).

At least 3 arguments must be provided:

  • One in the data dimension (data_elements, data_element_groups, or datasets)
  • One in the spatial dimension (org_units or org_unit_groups)
  • One in the temporal dimension (periods or start_date and end_date)

Data values are returned in a JSON-like list of dictionaries that can be converted into a Pandas or Polars dataframe.

>>> import polars as pl
>>> from openhexa.sdk import workspace
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection in an OpenHEXA workspace
>>> con = workspace.dhis2_connection("DHIS2_PLAY")
>>> dhis = DHIS2(con, cache_dir=".cache")

>>> data_values = dhis.data_value_sets.get(
...     datasets=["QX4ZTUbOt3a"],
...     org_units=["JQr6TJx5KE3", "KbO0JnhiMwl", "f90eISKFm7P"],
...     start_date="2022-01-01",
...     end_date="2022-04-01"
... )

>>> print(len(data_values))
301

>>> print(data_values[0])
{
    'dataElement': 'zzHwXqxKYy1', 'period': '202201', 'orgUnit': 'JQr6TJx5KE3', 'categoryOptionCombo': 'r8xySVHExGT', 'attributeOptionCombo': 'HllvX50cXC0', 'value': '2', 'storedBy': 'kailahun1', 'created': '2010-03-07T00:00:00.000+0000', 'lastUpdated': '2010-03-07T00:00:00.000+0000', 'comment': '', 'followup': False
}

>>> df = pl.DataFrame(data_values)
>>> print(df)

shape: (301, 11)
┌────────────┬────────┬────────────┬────────────┬───┬────────────┬────────────┬─────────┬──────────┐
│ dataElemenperiodorgUnitcategoryOp ┆ … ┆ createdlastUpdatecommentfollowup │
│ t------tionCombo  ┆   ┆ ---d------      │
│ ---strstr---        ┆   ┆ str---strbool     │
│ str        ┆        ┆            ┆ str        ┆   ┆            ┆ str        ┆         ┆          │
╞════════════╪════════╪════════════╪════════════╪═══╪════════════╪════════════╪═════════╪══════════╡
│ zzHwXqxKYy202201JQr6TJx5KEr8xySVHExG ┆ … ┆ 2010-03-072010-03-07 ┆         ┆ false    │
│ 1          ┆        ┆ 3T          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
│ zzHwXqxKYy202201JQr6TJx5KEcBQmyRrEKo ┆ … ┆ 2010-03-072010-03-07 ┆         ┆ false    │
│ 1          ┆        ┆ 33          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
│ zzHwXqxKYy202201JQr6TJx5KEU1PHVSShuW ┆ … ┆ 2010-03-072010-03-07 ┆         ┆ false    │
│ 1          ┆        ┆ 3j          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
│ zzHwXqxKYy202201f90eISKFm7dcguXUTwen ┆ … ┆ 2010-03-122010-03-12 ┆         ┆ false    │
│ 1          ┆        ┆ PI          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
│ …          ┆ …      ┆ …          ┆ …          ┆ … ┆ …          ┆ …          ┆ …       ┆ …        │
│ h8vtacmZL5202203f90eISKFm7bckzBoAurH ┆ … ┆ 2010-05-212010-05-21 ┆         ┆ false    │
│ j          ┆        ┆ PI          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
│ h8vtacmZL5202203f90eISKFm7TDb5JyDQqh ┆ … ┆ 2010-05-212010-05-21 ┆         ┆ false    │
│ j          ┆        ┆ Po          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
│ h8vtacmZL5202203f90eISKFm7y1jbXYIuub ┆ … ┆ 2010-05-212010-05-21 ┆         ┆ false    │
│ j          ┆        ┆ PN          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
│ h8vtacmZL5202203f90eISKFm7x1Ti1RoTKF ┆ … ┆ 2010-05-212010-05-21 ┆         ┆ false    │
│ j          ┆        ┆ Pr          ┆   ┆ T00:00:00.T00:00:00. ┆         ┆          │
│            ┆        ┆            ┆            ┆   ┆ 000+0000000+0000   ┆         ┆          │
└────────────┴────────┴────────────┴────────────┴───┴────────────┴────────────┴─────────┴──────────┘

Aggregated data from the Analytics tables can be read using the DHIS2.analytics.get() method. The method accepts the following arguments:

  • data_elements : list of str, optional
    Data element identifiers

  • data_element_groups : list of str, optional
    Data element groups identifiers

  • indicators: list of str, optional
    Indicator identifiers

  • indicator_groups: list of str, optional
    Indicator groups identifiers

  • periods : list of str, optional
    Period identifiers in ISO format

  • org_units : list of str, optional
    Organisation units identifiers

  • org_unit_groups : list of str, optional
    Organisation unit groups identifiers

  • org_unit_levels : list of int, optional
    Organisation unit levels

  • include_cocs : bool, optional (default=True)
    Include category option combos in response

At least 3 arguments must be provided:

  • One in the data dimension (data_elements, data_element_groups, indicators or indicator_groups)
  • One in the spatial dimension (org_units, org_unit_groups or org_unit_levels)
  • One in the temporal dimension (periods)

Data values are returned in a JSON-like list of dictionaries that can be converted into a Pandas or Polars dataframe.

>>> import polars as pl
>>> from openhexa.sdk import workspace
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection in an OpenHEXA workspace
>>> con = workspace.dhis2_connection("DHIS2_PLAY")
>>> dhis = DHIS2(con, cache_dir=".cache")

>>> data_values = play.analytics.get(
...     data_elements=["V37YqbqpEhV", "tn3p7vIxoKY", "HZSdnO5fCUc"],
...     org_units=["JQr6TJx5KE3", "KbO0JnhiMwl", "f90eISKFm7P"],
...     periods=["202201", "202202", "202203"]
... )

>>> df = pl.DataFrame(data_values)
>>> print(df)

shape: (14, 5)
┌─────────────┬─────────────┬─────────────┬────────┬───────┐
│ dxcooupevalue │
│ ---------------   │
│ strstrstrstrstr   │
╞═════════════╪═════════════╪═════════════╪════════╪═══════╡
│ V37YqbqpEhVPT59n8BQbqMJQr6TJx5KE32022015     │
│ V37YqbqpEhVpq2XI5kz2BYf90eISKFm7P2022014     │
│ V37YqbqpEhVPT59n8BQbqMf90eISKFm7P20220111    │
│ V37YqbqpEhVpq2XI5kz2BYJQr6TJx5KE32022012     │
│ …           ┆ …           ┆ …           ┆ …      ┆ …     │
│ V37YqbqpEhVpq2XI5kz2BYKbO0JnhiMwl20220312    │
│ V37YqbqpEhVpq2XI5kz2BYJQr6TJx5KE32022035     │
│ V37YqbqpEhVPT59n8BQbqMJQr6TJx5KE32022038     │
│ V37YqbqpEhVpq2XI5kz2BYf90eISKFm7P20220313    │
└─────────────┴─────────────┴─────────────┴────────┴───────┘

Helper methods to add name columns in addition to identifiers are available under the DHIS.meta namespace:

  • DHIS2.meta.add_dx_name_column()
  • DHIS2.meta.add_coc_name_column()
  • DHIS2.meta.add_org_unit_name_column()
>>> import polars as pl
>>> from openhexa.sdk import workspace
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection in an OpenHEXA workspace
>>> con = workspace.dhis2_connection("DHIS2_PLAY")
>>> dhis = DHIS2(con, cache_dir=".cache")

>>> data_values = dhis.analytics.get(
...     data_elements=["V37YqbqpEhV", "tn3p7vIxoKY", "HZSdnO5fCUc"],
...     org_units=["JQr6TJx5KE3", "KbO0JnhiMwl", "f90eISKFm7P"],
...     periods=["202201", "202202", "202203"]
... )

>>> df = pl.DataFrame(data_values)
>>> df = dhis.meta.add_dx_name_column(df)
>>> print(df)

shape: (14, 6)
┌─────────────┬─────────────┬─────────────┬────────┬───────┬───────────────────────────┐
│ dxcooupevaluedx_name                   │
│ ------------------                       │
│ strstrstrstrstrstr                       │
╞═════════════╪═════════════╪═════════════╪════════╪═══════╪═══════════════════════════╡
│ V37YqbqpEhVPT59n8BQbqMJQr6TJx5KE32022015IPT 2nd dose given at PHU │
│ V37YqbqpEhVpq2XI5kz2BYf90eISKFm7P2022014IPT 2nd dose given at PHU │
│ V37YqbqpEhVPT59n8BQbqMf90eISKFm7P20220111IPT 2nd dose given at PHU │
│ V37YqbqpEhVpq2XI5kz2BYJQr6TJx5KE32022012IPT 2nd dose given at PHU │
│ …           ┆ …           ┆ …           ┆ …      ┆ …     ┆ …                         │
│ V37YqbqpEhVpq2XI5kz2BYKbO0JnhiMwl20220312IPT 2nd dose given at PHU │
│ V37YqbqpEhVpq2XI5kz2BYJQr6TJx5KE32022035IPT 2nd dose given at PHU │
│ V37YqbqpEhVPT59n8BQbqMJQr6TJx5KE32022038IPT 2nd dose given at PHU │
│ V37YqbqpEhVpq2XI5kz2BYf90eISKFm7P20220313IPT 2nd dose given at PHU │
└─────────────┴─────────────┴─────────────┴────────┴───────┴───────────────────────────┘

An helper method to add the full org. unit pyramid to a dataframe is available under the DHIS.meta namespace:

  • DHIS2.meta.add_org_unit_parent_columns()
>>> import polars as pl
>>> from openhexa.sdk import workspace
>>> from openhexa.toolbox.dhis2 import DHIS2

>>> # initialize a new connection in an OpenHEXA workspace
>>> con = workspace.dhis2_connection("DHIS2_PLAY")
>>> dhis = DHIS2(con, cache_dir=".cache")

>>> data_values = dhis.analytics.get(
...     data_elements=["V37YqbqpEhV", "tn3p7vIxoKY", "HZSdnO5fCUc"],
...     org_units=["JQr6TJx5KE3", "KbO0JnhiMwl", "f90eISKFm7P"],
...     periods=["202201", "202202", "202203"]
... )

>>> df = pl.DataFrame(data_values)
>>> df = dhis.meta.add_org_unit_parent_columns(df)
>>> print(df)

shape: (14, 11)
┌────────────┬────────────┬───────────┬────────┬───┬───────────┬───────────┬───────────┬───────────┐
│ dxcooupe     ┆ … ┆ parent_leparent_leparent_leparent_le │
│ ------------    ┆   ┆ vel_2_idvel_2_namvel_3_idvel_3_nam │
│ strstrstrstr    ┆   ┆ ---e---e         │
│            ┆            ┆           ┆        ┆   ┆ str---str---       │
│            ┆            ┆           ┆        ┆   ┆           ┆ str       ┆           ┆ str       │
╞════════════╪════════════╪═══════════╪════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡
│ V37YqbqpEhPT59n8BQbqJQr6TJx5K202201 ┆ … ┆ jUb8gELQAKailahuncM2BKSrj9Luawa     │
│ VME3        ┆        ┆   ┆ pl        ┆           ┆ F9        ┆           │
│ V37YqbqpEhpq2XI5kz2Bf90eISKFm202201 ┆ … ┆ kJq2mPyFEKenemavzup1f6ynSmall Bo  │
│ VY7P        ┆        ┆   ┆ Ho        ┆           ┆ ON        ┆           │
│ V37YqbqpEhPT59n8BQbqf90eISKFm202201 ┆ … ┆ kJq2mPyFEKenemavzup1f6ynSmall Bo  │
│ VM7P        ┆        ┆   ┆ Ho        ┆           ┆ ON        ┆           │
│ V37YqbqpEhpq2XI5kz2BJQr6TJx5K202201 ┆ … ┆ jUb8gELQAKailahuncM2BKSrj9Luawa     │
│ VYE3        ┆        ┆   ┆ pl        ┆           ┆ F9        ┆           │
│ …          ┆ …          ┆ …         ┆ …      ┆ … ┆ …         ┆ …         ┆ …         ┆ …         │
│ V37YqbqpEhpq2XI5kz2BKbO0JnhiM202203 ┆ … ┆ PMa2VCrupKambiaQywkxFudXMagbema   │
│ VYwl        ┆        ┆   ┆ Od        ┆           ┆ rC        ┆           │
│ V37YqbqpEhpq2XI5kz2BJQr6TJx5K202203 ┆ … ┆ jUb8gELQAKailahuncM2BKSrj9Luawa     │
│ VYE3        ┆        ┆   ┆ pl        ┆           ┆ F9        ┆           │
│ V37YqbqpEhPT59n8BQbqJQr6TJx5K202203 ┆ … ┆ jUb8gELQAKailahuncM2BKSrj9Luawa     │
│ VME3        ┆        ┆   ┆ pl        ┆           ┆ F9        ┆           │
│ V37YqbqpEhpq2XI5kz2Bf90eISKFm202203 ┆ … ┆ kJq2mPyFEKenemavzup1f6ynSmall Bo  │
│ VY7P        ┆        ┆   ┆ Ho        ┆           ┆ ON        ┆           │
└────────────┴────────────┴───────────┴────────┴───┴───────────┴───────────┴───────────┴───────────┘

In developement.

Helper classes and methods to deal with DHIS2 periods are available in the openhexa.toolbox.dhis2.periods module.

>>> from openhexa.toolbox.dhis2.periods import Month, Quarter, period_from_string

>>> m1 = Month("202211")
>>> m2 = Month("202302")
>>> m2 > m1
True

>>> m1.get_range(m2)
["202211", "202212", "202301", "202302"]

>>> q1 = Quarter("2022Q3")
>>> q2 = Quarter("2023Q2")
>>> q1.get_range(q2)
["2022Q3", "2022Q4", "2023Q1", "2023Q2"]

>>> period_from_string("2022Q3") == q1
True