[FEA] Auto tuning/node recommendation missing features #1152

tgravescs · 2024-06-28T13:32:29Z

Is your feature request related to a problem? Please describe.
This is a list of things I ran into that I think is missing or could make the users life easier:

Databricks - specify the number of workers since it doesn't specify spark.executor.instances
EMR - expand recommendation logic to handle multiple GPU nodes
EMR - need to specify the number of workers as well. specifically in the case we go to machines with 1 GPU and cpu could have been using larger nodes with less of them.
. Some of our recommendations combine executors to choose machine size but then the ones we have only have a single GPU so we are halving the number of executors, which with GPU could have a big performance impact.. for instance CPU side on EMR has 16 cores and using 2 executors per node. We will say to use nodes with 32 cores but it only has 1 GPU.
In recommending the number of nodes currently we will recommend more if you have to have more nodes because smaller number of gpus supported to get the same number of executors, but we don't handle the opposite case where maybe we recommend a node with more GPUs per node so can put more executors on it and so would need less nodes.
Node types shouldn't be hardcoded
More Azure node types
Add driver instance recommendation to scala code
enhance python shape recommendation to include num gpus and num nodes
dataproc auth error is not obvious enough when node recommendation fails
switch hardcoded node maps to be lookup from json files generated from node types on each CSP
replace hardcoded instance information in Platform.scala with files generated from CSP information
...

tgravescs · 2024-07-02T18:37:59Z

this needs to be broken up @tgravescs to handle that once list is finalized

tgravescs added feature request New feature or request ? - Needs Triage labels Jun 28, 2024

amahussein added tools and removed ? - Needs Triage labels Jul 2, 2024

tgravescs self-assigned this Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Auto tuning/node recommendation missing features #1152

[FEA] Auto tuning/node recommendation missing features #1152

tgravescs commented Jun 28, 2024 •

edited

Loading

tgravescs commented Jul 2, 2024

[FEA] Auto tuning/node recommendation missing features #1152

[FEA] Auto tuning/node recommendation missing features #1152

Comments

tgravescs commented Jun 28, 2024 • edited Loading

tgravescs commented Jul 2, 2024

tgravescs commented Jun 28, 2024 •

edited

Loading