You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This is a list of things I ran into that I think is missing or could make the users life easier:
Databricks - specify the number of workers since it doesn't specify spark.executor.instances
EMR - expand recommendation logic to handle multiple GPU nodes
EMR - need to specify the number of workers as well. specifically in the case we go to machines with 1 GPU and cpu could have been using larger nodes with less of them.
. Some of our recommendations combine executors to choose machine size but then the ones we have only have a single GPU so we are halving the number of executors, which with GPU could have a big performance impact.. for instance CPU side on EMR has 16 cores and using 2 executors per node. We will say to use nodes with 32 cores but it only has 1 GPU.
In recommending the number of nodes currently we will recommend more if you have to have more nodes because smaller number of gpus supported to get the same number of executors, but we don't handle the opposite case where maybe we recommend a node with more GPUs per node so can put more executors on it and so would need less nodes.
Node types shouldn't be hardcoded
More Azure node types
Add driver instance recommendation to scala code
enhance python shape recommendation to include num gpus and num nodes
dataproc auth error is not obvious enough when node recommendation fails
switch hardcoded node maps to be lookup from json files generated from node types on each CSP
replace hardcoded instance information in Platform.scala with files generated from CSP information
...
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
This is a list of things I ran into that I think is missing or could make the users life easier:
The text was updated successfully, but these errors were encountered: