Skip to content

Conversation

@SpaceDA
Copy link

@SpaceDA SpaceDA commented Oct 14, 2025

What was changed

Refactored convert_proto_to_parquet_flatten for better memory efficiency and faster execution and added test coverage:

Implementation changes:

  • Replaced MessageToJson → MessageToDict to avoid JSON serialization overhead
  • Eliminated intermediate DataFrame creation and concatenation during conversion
  • Build single list of row dicts, then create DataFrame once with pd.json_normalize
  • Reduced memory usage by avoiding multiple DataFrame copies

Test coverage (10 new tests):

  • Unit tests for convert_proto_to_parquet_flatten using duck-typed fakes to simulate Temporal protos
  • Covers: basic conversion, empty executions, schema validation, column filtering
  • Edge cases: workflows with no events, missing attributes (documents 2 existing bugs)
  • Parametrized tests for multiple workflow scenarios

Why?

The original implementation created multiple intermediate DataFrames and performed expensive concat operations for each workflow, causing high memory usage and slow activity execution on large exports. The refactor builds data more efficiently while maintaining identical output.

Checklist

  1. How was this tested:
    How was this tested: All 10 tests pass (uv run poe test
    tests/cloud_export_to_parquet/test_data_trans_activities.py -v) (see test coverage above)

  2. Any docs updates needed?
    Don't think this applies.

@SpaceDA SpaceDA requested a review from a team as a code owner October 14, 2025 20:08
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants