Skip to content

cleanup task along with validation flag#1602

Merged
Avantika-Singh16 merged 5 commits into
datacommonsorg:masterfrom
Avantika-Singh16:cleanup_config_override
Sep 10, 2025
Merged

cleanup task along with validation flag#1602
Avantika-Singh16 merged 5 commits into
datacommonsorg:masterfrom
Avantika-Singh16:cleanup_config_override

Conversation

@Avantika-Singh16

Copy link
Copy Markdown
Contributor

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Avantika-Singh16, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the robustness and maintainability of data import configurations across various datasets. The changes streamline contact information, improve data source traceability, and refine validation and resource management settings for automated data pipelines. The overall goal is to ensure more consistent and efficient data processing.

Highlights

  • Curator Email Standardization: The curator_emails field has been updated across numerous manifest.json files, standardizing the contact email to support@datacommons.org for various datasets, including COVID-19 data, Eurostat statistics, Google mobility, and others.
  • Source File Specification: Many import_specifications now include a source_files array, explicitly listing the input files required for the data import process. This improves clarity and dependency tracking for automated data pipelines.
  • Validation Flag Configuration: A new config_override section with ignore_validation_status: false has been added to several manifest.json files, primarily for Eurostat, NOAA, OECD, and US Census datasets. Conversely, deprecated validation-related flags (invoke_import_tool, invoke_differ_tool, invoke_import_validation) have been removed from some FBI and US NCES manifests.
  • Resource Limit Adjustments and Cleanup: Resource limits (cpu, memory, disk) have been added to some manifests (e.g., Google COVID Mobility, US EPA National Emissions Inventory) and removed from others (e.g., Google Sustainability Financial Incentives, NCSES, Singapore Census, US Crash FARS data), indicating a review and optimization of resource allocation for data processing tasks.
  • Manifest Structure Refinements: Minor structural cleanups include the removal of a top-level import_name field in one CDC manifest and the removal of test scripts from the scripts array in Google COVID Mobility and PCM-DPC COVID-19 manifests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Avantika-Singh16, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on standardizing and refining the configuration of various data import pipelines. The changes aim to improve maintainability by centralizing curator contact information, clearly defining source file dependencies, and consistently applying validation settings. These updates contribute to a more robust and transparent data ingestion system.

Highlights

  • Standardized Curator Emails: Updated the 'curator_emails' field to a consistent 'support@datacommons.org' across numerous manifest files, centralizing contact information for data curators.
  • Enhanced Source File Specification: Added or updated the 'source_files' field in many manifest files, explicitly listing the input files required for data import processes. This improves clarity and dependency tracking.
  • Validation Flag Configuration: Introduced or modified the 'config_override' block with 'ignore_validation_status: false' in several Eurostat, OECD, US BLS, US CDC, US Census, US HUD, US Fed, US Crash, World Bank, and FBI manifest files, indicating a standardized approach to validation.
  • Resource Limit Adjustments: Removed specific 'resource_limits' configurations from several manifest files (e.g., Google Sustainability, NCSES, Singapore Census, US Crash) and added them to others (Google Covid Mobility, US EPA National Emissions), streamlining resource allocation definitions.
  • Manifest File Cleanup: Performed general cleanup in various manifest files, including removing redundant top-level 'import_name' fields and obsolete script references, and consolidating configuration overrides.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request performs a wide-ranging cleanup across numerous manifest files. The changes primarily involve updating curator emails to a standard address, adding source_files for better dependency tracking, and enabling validation by adding config_override flags. While these are valuable improvements, there are several issues to address. I've found a potential bug where a source_files entry seems to be a copy-paste error. Additionally, there are widespread JSON formatting inconsistencies, mainly related to indentation, and many files are missing a final newline character. I've also noted a few other inconsistencies and minor issues. Please review the detailed comments.

Comment thread scripts/covid19_india/medical_tests_in_data/manifest.json Outdated
Comment thread scripts/eurostat/health_determinants/manifest.json Outdated
Comment thread scripts/eurostat/regional_statistics_by_nuts/birth_death_migration/manifest.json Outdated
Comment thread scripts/eurostat/regional_statistics_by_nuts/education_attainment/manifest.json Outdated
Comment thread scripts/eurostat/regional_statistics_by_nuts/education_enrollment/manifest.json Outdated
Comment thread statvar_imports/us_bls/bls_ces_state/manifest.json Outdated
Comment thread statvar_imports/us_bls/cpi_category/manifest.json Outdated
Comment thread statvar_imports/us_census/pep/us_census_pep_asrh/manifest.json
Comment thread statvar_imports/us_census/us_monthly_retail_sales/manifest.json Outdated
Comment thread statvar_imports/world_bank/commodity_market/manifest.json Outdated
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a broad set of cleanup changes across numerous manifest files. Key updates include standardizing curator emails to a general support address, adding source_files properties, and enabling validation flags by adding or modifying config_override sections. While these changes are largely positive for consistency and data integrity, I've identified a few issues, including a potential copy-paste error in a file path, an invalid cron schedule, and several instances of inconsistent JSON formatting. Please review the specific comments for details and suggestions.

Comment thread scripts/india_rbi/below_poverty_line/manifest.json Outdated
Comment thread scripts/covid19_india/medical_tests_in_data/manifest.json Outdated
Comment thread scripts/us_fed/treasury_constant_maturity_rates/manifest.json Outdated
Comment thread scripts/covid_tracking_project/historic_state_data/manifest.json Outdated
Comment thread scripts/eurostat/health_determinants/manifest.json Outdated
Comment thread scripts/oecd/regional_demography/pop_density/manifest.json Outdated
Comment thread scripts/eurostat/health_determinants/manifest.json
@vish-cs vish-cs self-requested a review September 8, 2025 13:44
@vish-cs

vish-cs commented Sep 9, 2025

Copy link
Copy Markdown
Contributor

Can we separate the formatting changes from actual changes (either as separate PR or as documentation) to help with reviewing?

@vish-cs

vish-cs commented Sep 9, 2025

Copy link
Copy Markdown
Contributor

We are planning to enable validation check for all the imports globally in the below PR so we may not have to enable it individually for each import
#1605

@vish-cs

vish-cs commented Sep 9, 2025

Copy link
Copy Markdown
Contributor

Once we merge this PR, we will need to reschedule all the imports with updated manifests to pick up the changes. Please keep that in mind as a follow up.

@Avantika-Singh16

Copy link
Copy Markdown
Contributor Author

Can we separate the formatting changes from actual changes (either as separate PR or as documentation) to help with reviewing?

We have fixed the lint issues and we have not done any format change explicit

@Avantika-Singh16

Copy link
Copy Markdown
Contributor Author

We are planning to enable validation check for all the imports globally in the below PR so we may not have to enable it individually for each import #1605

We have only sync the existing cloud run and cloud batch configurations into the manifest JSON in this PR

@Avantika-Singh16

Copy link
Copy Markdown
Contributor Author

Once we merge this PR, we will need to reschedule all the imports with updated manifests to pick up the changes. Please keep that in mind as a follow up.

We have already planned for his activity .Once PR will merged we will planned accordingly

Comment thread scripts/us_census/pep/population_estimates_by_asr/manifest.json Outdated
Comment thread scripts/india_rbi/below_poverty_line/manifest.json Outdated
Comment thread scripts/us_bts/latch/manifest.json
Comment thread scripts/us_census/pep/us_pep_sex/manifest.json Outdated
Comment thread scripts/us_epa/ghgrp/manifest.json Outdated
Comment thread scripts/us_epa/national_emissions_inventory/manifest.json Outdated
Comment thread scripts/us_fed/treasury_constant_maturity_rates/manifest.json
Comment thread statvar_imports/us_crash/fars_crashdata/manifest.json Outdated
Comment thread statvar_imports/us_census/pep/us_census_pep_asrh/manifest.json
Comment thread statvar_imports/us_census/pep/us_census_pep_asrh/manifest.json
@Avantika-Singh16 Avantika-Singh16 merged commit 6fe332e into datacommonsorg:master Sep 10, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants