Skip to content

update README with catalog integration with dataOps#43

Open
HPSLU wants to merge 1 commit intomainfrom
hp-41-update-documentation
Open

update README with catalog integration with dataOps#43
HPSLU wants to merge 1 commit intomainfrom
hp-41-update-documentation

Conversation

@HPSLU
Copy link
Copy Markdown
Collaborator

@HPSLU HPSLU commented Apr 29, 2026

Description updated issue#41 to add information about the integration of the catalog with dataOps.

@HPSLU HPSLU self-assigned this Apr 29, 2026
@HPSLU HPSLU linked an issue Apr 29, 2026 that may be closed by this pull request
@ryanraaschCDC ryanraaschCDC self-requested a review April 29, 2026 19:22
Copy link
Copy Markdown
Collaborator

@ryanraaschCDC ryanraaschCDC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just minor typos for fixing. Looks great!

Comment thread README.md
## Overview

The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). It serves as the organizational layer on top of the CFA dataOps framework, enabling teams to standardize how data assets are described, discovered, and used across projects. This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFA is Center for Forecasting and Outbreak Analytics (missing Outbreak here)

Comment thread README.md
## Overview

The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
The CFA Catalog: Public (CDCgov) is a comprehensive data management and analysis platform designed for the CDC's Center for Forecasting and Analytics (CFA). It serves as the organizational layer on top of the CFA dataOps framework, enabling teams to standardize how data assets are described, discovered, and used across projects. This catalog provides a structured framework for managing datasets, workflows, modeling components, and reports related to public health data analysis and forecasting.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell dataOps as DataOps

Comment thread README.md
### How It Integrates with CFA DataOps
The CFA Data Catalog is tightly coupled with the CFA DataOps framework and functions as its primary interface for dataset definition and discovery.

CFA DataOps provides the execution layer, while the catalog provides the declaratie layer.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declarative

Comment thread README.md
- Provides utilities for accessing datasets and APIs (e.g. Socrata)

**CFA Catalog Responsibilities**
- Defines dataset structure, transformations, adn schemas
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and

Comment thread README.md
1. A dataset is defined in the catalog with its schema and transformation logic.
2. CFA DataOps reads the catalog definition and executes the corresponding pipeline.
3. Data is validated, transformed, and stored in Azure Blob Storage with versioning.
4. Downstream users access the dataset via standardized interfaces or generat reports using reportcat.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generat -> generate

Comment thread README.md
Recent enhancements further strengthen this integration, including:
- LazyFrame loading in Polars for efficient data access without immediate materialization.
- Automated schema and mock data generation directly from catalog definitions.
- Migration toward Dagster for mor robust orchestration and scheduling.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mor -> more

Comment thread README.md
- Automated schema and mock data generation directly from catalog definitions.
- Migration toward Dagster for mor robust orchestration and scheduling.

Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

daa -> data

Comment thread README.md
Together, the catalog and CFA DataOps create a unified system where daa engineering is reproducible, discoverable, and scalable.

### Getting Started
New users can begin working with CFA Data Catalog by following these steps:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change 'CFA Data Catalog' to 'the CFA Public Catalog'

Comment thread README.md
Execute existing ETL workflows or define new ones using catalog templates and configuration files.

4. Validate and Test
Leverage built-in schema validation and mock data generation to ensure correctness during development
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place a period at the end

Comment thread README.md

### Getting Started
New users can begin working with CFA Data Catalog by following these steps:
1. Explore the Catalog
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason the headers for 1 through 4 and the lines beneath them aren't separated by newlines. It just looks like one long sentence for numbers 1 through 4. I think in markdown you need two spaces after a line to return to the next line.

@HPSLU
Copy link
Copy Markdown
Collaborator Author

HPSLU commented Apr 30, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

update documentation

3 participants