Update synthea and wdi metadata #451

juankx-bodo · 2025-10-31T02:36:21Z

Update metadata with AI enriched version based on latest metadata generation tool version
Sample values and all definitions are based on the actual BIRD full dataset.

john-sanchez31

Why are we changing this? For #457 I'm using synthea metadata from s3 so it will be no longer needed in the repo (actually this metadata will be no longer in the repo). Also I think we should wait for the new [run custom] flag which runs custom tests so this can be tested properly.

john-sanchez31 · 2025-11-17T15:45:58Z

tests/test_metadata/synthea_graph.json

        "properties": [
          {
-            "name": "ITEM",
+            "name": "item",


Why all properties are being change from uppercase to lowecase?

names generated by the metadata tool are lowercase. No manual changes were done to htese files.

john-sanchez31 · 2025-11-17T15:52:11Z

tests/test_metadata/world_development_indicators_graph.json

+            "name": "year_",
            "type": "table column",
-            "column name": "Year",
+            "column name": "\"Year\"",


Why is quoted?

The metadata generation tool have Python, Pydough and SQL keyword lists. If a table path or column name value matches that list or have special characters then the value is double-quote enclosed and any double-quote character in the name is escaped. With the quoting fix in Pydough this should not be an issue.

hadia206 · 2025-11-17T23:51:00Z

I agree with John, do we need that after the changes he made in this PR?

juankx-bodo · 2025-11-18T14:50:25Z

Why are we changing this? For #457 I'm using synthea metadata from s3 so it will be no longer needed in the repo (actually this metadata will be no longer in the repo). Also I think we should wait for the new [run custom] flag which runs custom tests so this can be tested properly.

This is actually true. With reserved words dataset and tests from s3 we no longer need synthea and WDI datasets on our repo anymore. We can talk about this. Meanwhile, the purpose of the PR was to update the "original" json files with the latest version used by LLM team, generated by the tool and enriched with help of the BIRD csv files.

juankx-bodo · 2025-11-18T15:08:47Z

I agree with John, do we need that after the changes he made in this PR?

The intention of the PR is to use the latest version of the json used also by the LLM team. Also agree that we should not have Synthea and WDI datasets in our repo, each of them are there only for 1 bug that also are covered with reserved words dataset. In case we want to keep those tests we could move them to use the s3 version of the datasets.

knassre-bodo · 2025-11-24T17:35:44Z

@juankx-bodo I'm still confused why we need this PR at all since we shouldn't be using the JSON files in the repo, they get downloaded from S3.

juankx-bodo added 4 commits November 12, 2025 08:14

Update synthea and WDI metadata

bc34390

Update synthea and WDI metadata

598e36c

Update synthea and WDI metadata

0fac096

[RUN CI]

e028c1d

juankx-bodo force-pushed the jkx/Update_synthea_and_WDI_metadata branch from 6b1b7a8 to e028c1d Compare November 12, 2025 14:15

juankx-bodo requested review from hadia206, john-sanchez31 and knassre-bodo November 12, 2025 15:27

john-sanchez31 reviewed Nov 17, 2025

View reviewed changes

hadia206 marked this pull request as draft December 23, 2025 01:00

juankx-bodo closed this Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update synthea and wdi metadata #451

Update synthea and wdi metadata #451

Uh oh!

juankx-bodo commented Oct 31, 2025 •

edited

Loading

Uh oh!

john-sanchez31 left a comment

Uh oh!

john-sanchez31 Nov 17, 2025

Uh oh!

juankx-bodo Nov 18, 2025

Uh oh!

john-sanchez31 Nov 17, 2025

Uh oh!

juankx-bodo Nov 18, 2025

Uh oh!

hadia206 commented Nov 17, 2025

Uh oh!

juankx-bodo commented Nov 18, 2025

Uh oh!

juankx-bodo commented Nov 18, 2025

Uh oh!

knassre-bodo commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Update synthea and wdi metadata #451

Update synthea and wdi metadata #451

Uh oh!

Conversation

juankx-bodo commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

john-sanchez31 left a comment

Choose a reason for hiding this comment

Uh oh!

john-sanchez31 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

juankx-bodo Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

john-sanchez31 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

juankx-bodo Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

hadia206 commented Nov 17, 2025

Uh oh!

juankx-bodo commented Nov 18, 2025

Uh oh!

juankx-bodo commented Nov 18, 2025

Uh oh!

knassre-bodo commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

juankx-bodo commented Oct 31, 2025 •

edited

Loading