Skip to content

Poor scaling for runs with stochastic perturbations (in RRFSv1 system) #91

@MatthewPyle-NOAA

Description

@MatthewPyle-NOAA

While testing the RRFS ensemble system using more nodes, we noticed that the timing improvements with more resources wasn't very good, and seemed to level off pretty significantly for higher node counts. As background, we've been running our experiments with 52 nodes, but plan to run in production with about 92 nodes. Out of curiosity we increased the node count first from 92 to 110 nodes (a 19.5% increase) but the timing only decreased by about 3.5%. Our deterministic runs not using perturbations scale quite a bit better. Wasn't sure if this is a known feature - we are running on a 3950 x 2700, 3 km grid spacing grid, and due to stability needs have to regenerate the stochastic pattern generation quite frequently (lndpint=180 - 5 time steps; sppint=36 - every time step)

WCOSS2 nodes wall clock (s)
52 11525
92 8463 (from 52 to 92: ~77% more nodes, about 26.6% faster)
102 8251
110 8163 (from 92 to 110: ~19% more nodes, about 3.5% faster)

Tagging @JiliDong-NOAA as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions