update README with catalog integration with dataOps#43
Conversation
ryanraaschCDC
left a comment
There was a problem hiding this comment.
just minor typos for fixing. Looks great!
| ## Overview | ||
|
|
||
| The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting. | ||
| The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). It serves as the organizational layer on top of the CFA dataOps framework, enabling teams to standardize how data assets are described, discovered, and used across projects. This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting. |
There was a problem hiding this comment.
CFA is Center for Forecasting and Outbreak Analytics (missing Outbreak here)
| ## Overview | ||
|
|
||
| The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting. | ||
| The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). It serves as the organizational layer on top of the CFA dataOps framework, enabling teams to standardize how data assets are described, discovered, and used across projects. This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting. |
There was a problem hiding this comment.
spell dataOps as DataOps
| ### How It Integrates with CFA DataOps | ||
| The CFA Data Catalog is tightly coupled with the CFA DataOps framework and functions as its primary interface for dataset definition and discovery. | ||
|
|
||
| CFA DataOps provides the execution layer, while the catalog provides the declaratie layer. |
| - Provides utilities for accessing datasets and APIs (e.g. Socrata) | ||
|
|
||
| **CFA Catalog Responsibilities** | ||
| - Defines dataset structure, transformations, adn schemas |
| 1. A dataset is defined in the catalog with its schema and transformation logic. | ||
| 2. CFA DataOps reads the catalog definition and executes the corresponding pipeline. | ||
| 3. Data is validated, transformed, and stored in Azure Blob Storage with versioning. | ||
| 4. Downstream users access the dataset via standardized interfaces or generat reports using reportcat. |
There was a problem hiding this comment.
generat -> generate
| Recent enhancements further strengthen this integration, including: | ||
| - LazyFrame loading in Polars for efficient data access without immediate materialization. | ||
| - Automated schema and mock data generation directly from catalog definitions. | ||
| - Migration toward Dagster for mor robust orchestration and scheduling. |
| - Automated schema and mock data generation directly from catalog definitions. | ||
| - Migration toward Dagster for mor robust orchestration and scheduling. | ||
|
|
||
| Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable. |
| Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable. | ||
|
|
||
| ### Getting Started | ||
| New users can begin working with CFA Data Catalog by following these steps: |
There was a problem hiding this comment.
change 'CFA Data Catalog' to 'the CFA Public Catalog'
| Execute existing ETL workflows or define new ones using catalog templates and configuration files. | ||
|
|
||
| 4. Validate and Test | ||
| Leverage built-in schema validation and mock data generation to ensure correctness during development |
There was a problem hiding this comment.
place a period at the end
|
|
||
| ### Getting Started | ||
| New users can begin working with CFA Data Catalog by following these steps: | ||
| 1. Explore the Catalog |
There was a problem hiding this comment.
for some reason the headers for 1 through 4 and the lines beneath them aren't separated by newlines. It just looks like one long sentence for numbers 1 through 4. I think in markdown you need two spaces after a line to return to the next line.
|
Thanks Ryan for your feedback.
From: Ryan Raasch ***@***.***>
Sent: Wednesday, April 29, 2026 7:18 PM
To: CDCgov/cfa-catalog-pub ***@***.***>
Cc: Patrick, Heather (CDC/NCIRD/OD) (CTR) ***@***.***>; Author ***@***.***>
Subject: Re: [CDCgov/cfa-catalog-pub] update README with catalog integration with dataOps (PR #43)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
________________________________
@ryanraaschCDC requested changes on this pull request.
just minor typos for fixing. Looks great!
________________________________
In README.md<#43 (comment)>:
@@ -2,7 +2,10 @@
## Overview
-The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
+The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). It serves as the organizational layer on top of the CFA dataOps framework, enabling teams to standardize how data assets are described, discovered, and used across projects. This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
CFA is Center for Forecasting and Outbreak Analytics (missing Outbreak here)
________________________________
In README.md<#43 (comment)>:
@@ -2,7 +2,10 @@
## Overview
-The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
+The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). It serves as the organizational layer on top of the CFA dataOps framework, enabling teams to standardize how data assets are described, discovered, and used across projects. This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
spell dataOps as DataOps
________________________________
In README.md<#43 (comment)>:
@@ -36,6 +39,52 @@ The catalog is built on the `cfa-dataops` framework and provides:
- `nwss`
- `param_estimates`
+### How It Integrates with CFA DataOps
+The CFA Data Catalog is tightly coupled with the CFA DataOps framework and functions as its primary interface for dataset definition and discovery.
+
+CFA DataOps provides the execution layer, while the catalog provides the declaratie layer.
declarative
________________________________
In README.md<#43 (comment)>:
@@ -36,6 +39,52 @@ The catalog is built on the `cfa-dataops` framework and provides:
- `nwss`
- `param_estimates`
+### How It Integrates with CFA DataOps
+The CFA Data Catalog is tightly coupled with the CFA DataOps framework and functions as its primary interface for dataset definition and discovery.
+
+CFA DataOps provides the execution layer, while the catalog provides the declaratie layer.
+
+ **CFA DataOps Responsibilities**
+ - Executes ETL pipelines defined in the catalog
+ - Manages schema validation and mock data generation
+ - Handles data versioning and storage in Azure Blob Storage
+ - Provides utilities for accessing datasets and APIs (e.g. Socrata)
+
+ **CFA Catalog Responsibilities**
+ - Defines dataset structure, transformations, adn schemas
and
________________________________
In README.md<#43 (comment)>:
+ **CFA DataOps Responsibilities**
+ - Executes ETL pipelines defined in the catalog
+ - Manages schema validation and mock data generation
+ - Handles data versioning and storage in Azure Blob Storage
+ - Provides utilities for accessing datasets and APIs (e.g. Socrata)
+
+ **CFA Catalog Responsibilities**
+ - Defines dataset structure, transformations, adn schemas
+ - Organizes workflows and reporting artifacts
+ - Serves as the entry point for users interacting with data assets.
+
+In practice, this integration enables a configuration-driven workflow:
+ 1. A dataset is defined in the catalog with its schema and transformation logic.
+ 2. CFA DataOps reads the catalog definition and executes the corresponding pipeline.
+ 3. Data is validated, transformed, and stored in Azure Blob Storage with versioning.
+ 4. Downstream users access the dataset via standardized interfaces or generat reports using reportcat.
generat -> generate
________________________________
In README.md<#43 (comment)>:
+
+ **CFA Catalog Responsibilities**
+ - Defines dataset structure, transformations, adn schemas
+ - Organizes workflows and reporting artifacts
+ - Serves as the entry point for users interacting with data assets.
+
+In practice, this integration enables a configuration-driven workflow:
+ 1. A dataset is defined in the catalog with its schema and transformation logic.
+ 2. CFA DataOps reads the catalog definition and executes the corresponding pipeline.
+ 3. Data is validated, transformed, and stored in Azure Blob Storage with versioning.
+ 4. Downstream users access the dataset via standardized interfaces or generat reports using reportcat.
+
+Recent enhancements further strengthen this integration, including:
+ - LazyFrame loading in Polars for efficient data access without immediate materialization.
+ - Automated schema and mock data generation directly from catalog definitions.
+ - Migration toward Dagster for mor robust orchestration and scheduling.
mor -> more
________________________________
In README.md<#43 (comment)>:
+ - Defines dataset structure, transformations, adn schemas
+ - Organizes workflows and reporting artifacts
+ - Serves as the entry point for users interacting with data assets.
+
+In practice, this integration enables a configuration-driven workflow:
+ 1. A dataset is defined in the catalog with its schema and transformation logic.
+ 2. CFA DataOps reads the catalog definition and executes the corresponding pipeline.
+ 3. Data is validated, transformed, and stored in Azure Blob Storage with versioning.
+ 4. Downstream users access the dataset via standardized interfaces or generat reports using reportcat.
+
+Recent enhancements further strengthen this integration, including:
+ - LazyFrame loading in Polars for efficient data access without immediate materialization.
+ - Automated schema and mock data generation directly from catalog definitions.
+ - Migration toward Dagster for mor robust orchestration and scheduling.
+
+Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable.
daa -> data
________________________________
In README.md<#43 (comment)>:
+
+In practice, this integration enables a configuration-driven workflow:
+ 1. A dataset is defined in the catalog with its schema and transformation logic.
+ 2. CFA DataOps reads the catalog definition and executes the corresponding pipeline.
+ 3. Data is validated, transformed, and stored in Azure Blob Storage with versioning.
+ 4. Downstream users access the dataset via standardized interfaces or generat reports using reportcat.
+
+Recent enhancements further strengthen this integration, including:
+ - LazyFrame loading in Polars for efficient data access without immediate materialization.
+ - Automated schema and mock data generation directly from catalog definitions.
+ - Migration toward Dagster for mor robust orchestration and scheduling.
+
+Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable.
+
+### Getting Started
+New users can begin working with CFA Data Catalog by following these steps:
change 'CFA Data Catalog' to 'the CFA Public Catalog'
________________________________
In README.md<#43 (comment)>:
+ - Automated schema and mock data generation directly from catalog definitions.
+ - Migration toward Dagster for mor robust orchestration and scheduling.
+
+Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable.
+
+### Getting Started
+New users can begin working with CFA Data Catalog by following these steps:
+1. Explore the Catalog
+ Review available datasets, schemas, and workflows defined in the catalog repository.
+2. Access Data via CFA DataOps
+ Use CFA DataOps utilities to load datasets into your analysis environment. Data can be queried using DuckDB or Polars without requiring external database infrastructure.
+3. Run or Extend Pipelines
+ Execute existing ETL workflows or define new ones using catalog templates and configuration files.
+
+4. Validate and Test
+ Leverage built-in schema validation and mock data generation to ensure correctness during development
place a period at the end
________________________________
In README.md<#43 (comment)>:
+In practice, this integration enables a configuration-driven workflow:
+ 1. A dataset is defined in the catalog with its schema and transformation logic.
+ 2. CFA DataOps reads the catalog definition and executes the corresponding pipeline.
+ 3. Data is validated, transformed, and stored in Azure Blob Storage with versioning.
+ 4. Downstream users access the dataset via standardized interfaces or generat reports using reportcat.
+
+Recent enhancements further strengthen this integration, including:
+ - LazyFrame loading in Polars for efficient data access without immediate materialization.
+ - Automated schema and mock data generation directly from catalog definitions.
+ - Migration toward Dagster for mor robust orchestration and scheduling.
+
+Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable.
+
+### Getting Started
+New users can begin working with CFA Data Catalog by following these steps:
+1. Explore the Catalog
for some reason the headers for 1 through 4 and the lines beneath them aren't separated by newlines. It just looks like one long sentence for numbers 1 through 4. I think in markdown you need two spaces after a line to return to the next line.
—
Reply to this email directly, view it on GitHub<#43 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BXH2R5MPO23LMGUUF3NZK2L4YKETJAVCNFSM6AAAAACYLFHUY6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DEMBRGMYDANBXGY>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Description updated issue#41 to add information about the integration of the catalog with dataOps.