refactor: make convert_proto_to_parquet_flatten more memory-efficient #259

SpaceDA · 2025-10-14T20:08:12Z

What was changed

Refactored convert_proto_to_parquet_flatten for better memory efficiency and faster execution and added test coverage:

Implementation changes:

Replaced MessageToJson → MessageToDict to avoid JSON serialization overhead
Eliminated intermediate DataFrame creation and concatenation during conversion
Build single list of row dicts, then create DataFrame once with pd.json_normalize
Reduced memory usage by avoiding multiple DataFrame copies

Test coverage (10 new tests):

Unit tests for convert_proto_to_parquet_flatten using duck-typed fakes to simulate Temporal protos
Covers: basic conversion, empty executions, schema validation, column filtering
Edge cases: workflows with no events, missing attributes (documents 2 existing bugs)
Parametrized tests for multiple workflow scenarios

Why?

The original implementation created multiple intermediate DataFrames and performed expensive concat operations for each workflow, causing high memory usage and slow activity execution on large exports. The refactor builds data more efficiently while maintaining identical output.

Checklist

How was this tested:
How was this tested: All 10 tests pass (uv run poe test
tests/cloud_export_to_parquet/test_data_trans_activities.py -v) (see test coverage above)
Any docs updates needed?
Don't think this applies.

CLAassistant · 2025-10-14T20:08:20Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

SpaceDA added 2 commits October 14, 2025 14:37

refactor: make convert_proto_to_parquet_flatten more memory-efficient

4919381

add tests for proto to parquet

548acbc

SpaceDA requested a review from a team as a code owner October 14, 2025 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: make convert_proto_to_parquet_flatten more memory-efficient #259

refactor: make convert_proto_to_parquet_flatten more memory-efficient #259

Uh oh!

SpaceDA commented Oct 14, 2025

Uh oh!

CLAassistant commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: make convert_proto_to_parquet_flatten more memory-efficient #259

Are you sure you want to change the base?

refactor: make convert_proto_to_parquet_flatten more memory-efficient #259

Uh oh!

Conversation

SpaceDA commented Oct 14, 2025

What was changed

Why?

Checklist

Uh oh!

CLAassistant commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants