Skip to content

update csv to parquet to use multiprocessing#795

Merged
shorvath-noaa merged 1 commit intoNOAA-OWP:masterfrom
JoshCu:parallel_csv_to_parquet
Jul 10, 2024
Merged

update csv to parquet to use multiprocessing#795
shorvath-noaa merged 1 commit intoNOAA-OWP:masterfrom
JoshCu:parallel_csv_to_parquet

Conversation

@JoshCu
Copy link
Copy Markdown
Contributor

@JoshCu JoshCu commented Jul 3, 2024

Multiprocessing for Faster Input Reformatting

When testing routing for 6500 catchments and 24 timesteps, a large portion of the troute execution time is still spent on reformatting ngen output. PR #714 partially addressed this issue with an awk command. This PR adds multiprocessing to speed it up even more.

Performance Comparison

Without Multiprocessing

************ TIMING SUMMARY ************
----------------------------------------
Network graph construction: 0.8 secs,  7.27%
Forcing array construction: 7.7 secs, 69.93%
Routing computations:      1.89 secs, 17.20%
Output writing:            0.61 secs,  5.56%
----------------------------------------

With Multiprocessing (56 cores - default to CPU core count)

************ TIMING SUMMARY ************
----------------------------------------
Network graph construction: 0.84 secs, 19.68%
Forcing array construction: 0.98 secs, 22.94%
Routing computations:      1.82 secs, 42.92%
Output writing:            0.61 secs, 14.36%
----------------------------------------

With Multiprocessing (4 cores - hardcoded example)

************ TIMING SUMMARY ************
----------------------------------------
Network graph construction: 0.79 secs, 14.09%
Forcing array construction: 2.36 secs, 42.39%
Routing computations:       1.8 secs, 32.29%
Output writing:            0.62 secs, 11.15%
----------------------------------------

Notes

  • This works in ngiab with ngen run serially,
  • Testing this via ngen with MPI will actually reduce performance due to an issue being worked on in NOAA-OWP/ngen#846.
  • Building ngen from that PR should work.
  • An ngiab image for x86 with both the MPI patch and this troute patch applied is available at joshcu/ngiab_dev.

@shorvath-noaa shorvath-noaa merged commit ed8a105 into NOAA-OWP:master Jul 10, 2024
@JoshCu JoshCu deleted the parallel_csv_to_parquet branch July 11, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants