You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
Running inference with the enable_workflow (Ray Workflow) option causes all processes to be pinned to a single core.
Steps to Reproduce:
Follow the conda or container installation instructions and run inference with the --enable_workflow option
Expected Behavior:
It is expected that the workload would be spread across the resources available in the Ray Cluster, i.e. processes should run on difference cores
Actual Behavior:
All processes and Ray workers have the same cpu/core affinity:
In top, notice the P column is all 0 for FastFold processes
Description:
Running inference with the enable_workflow (Ray Workflow) option causes all processes to be pinned to a single core.
Steps to Reproduce:
Follow the conda or container installation instructions and run inference with the --enable_workflow option
Expected Behavior:
It is expected that the workload would be spread across the resources available in the Ray Cluster, i.e. processes should run on difference cores
Actual Behavior:
All processes and Ray workers have the same cpu/core affinity:
In top, notice the P column is all 0 for FastFold processes
This is also confirmed with taskset (output is truncated)
Environment:
Steps Taken to Resolve:
This seems to be a torch issue see: ray-project/ray/issues/34201 and pytorch/pytorch/issues/99625 . One fix is to set KMP_AFFINITY to disabled before running inference:
The text was updated successfully, but these errors were encountered: