Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: 为什么环境变量定义的需要虚拟化的设备和实际设备数量相同时会启动虚拟化? #9

Open
lut777 opened this issue Jul 12, 2024 · 0 comments

Comments

@lut777
Copy link

lut777 commented Jul 12, 2024

请问一下这个问题:

代码段如下:
在文件 /src/utils.c 中. 这一段代码对是否需要虚拟化进行了判断:

int getenvcount() {
    char *s = getenv("CUDA_VISIBLE_DEVICES");
    if ((s == NULL) || (strlen(s)==0)){
        return -1;
    }
    LOG_DEBUG("get from env %s",s);
    int i,count=0;
    for (i=0;i<strlen(s);i++){
        if (s[i]==',')
            count++;
    }
    return count+1;
}

int need_cuda_virtualize() {
    int count1 = -1;
    char *s = getenv("CUDA_VISIBLE_DEVICES");
    if ((s == NULL) || (strlen(s)==0)){
        return 0;
    }
    int fromenv = getenvcount();
    CUresult res = CUDA_OVERRIDE_CALL(cuda_library_entry,cuDeviceGetCount,&count1);
    if (res != CUDA_SUCCESS) {
        return 1;
    }
    LOG_WARN("count1=%d",count1);
    if (fromenv ==count1) {
        return 1;
    }
    return 0;
}

我不太理解为什么是实际GPU数量和需要虚拟化的index序号数量相同时, 会启动虚拟化?
我的理解是虚拟化之后的GPU卡数量和原有的应该是不同的. 不然就只是换个顺序了.

而且在启动虚拟化之后, 是否应该先清空 cuda_to_nvml_map 的内容之后再进行复制?
就是这部分的代码.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant