FEAT: Support pygloo in collective communication#38
FEAT: Support pygloo in collective communication#38mergify[bot] merged 18 commits intoxorbitsai:mainfrom YibinLiu666:pygloo
Conversation
Codecov Report
@@ Coverage Diff @@
## main #38 +/- ##
==========================================
- Coverage 93.85% 93.78% -0.07%
==========================================
Files 43 42 -1
Lines 3399 3361 -38
Branches 675 672 -3
==========================================
- Hits 3190 3152 -38
Misses 138 138
Partials 71 71
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
Also, we need to support |
Implemented |
codingl2k1
left a comment
There was a problem hiding this comment.
We should keep only one Store type and wrap the TCPStore to PrefixStore directly.
|
|
||
| std::vector<T> inputbuf(size); | ||
|
|
||
| memcpy(inputbuf.data(), input_ptr, size * sizeof(T)); |
There was a problem hiding this comment.
Is this mem copy required by gloo?
There was a problem hiding this comment.
gloo ReduceScatterHalvingDoubling algorithm accepts a vector with just one ptr: pytorch/gloo#303
I am not sure if we can remove this copy in the future.
| bool TCPStore::hasExtendedApi() const { return true; } | ||
|
|
||
| void TCPStore::set(const std::string &key, const std::vector<char> &data) { | ||
| std::vector<uint8_t> dataSet(data.begin(), data.end()); |
There was a problem hiding this comment.
Here copying data to cast the vector<char> to vector<uint8_t>. But the send function accepts a const char * type here https://github.com/xorbitsai/xoscar/blob/main/cpp/collective/rendezvous/include/utils.hpp#L121:
::send(socket, (const char *) currentBytes, bytesToSend, flags)We should unify the data types to char* to avoid copying data.
There was a problem hiding this comment.
Here copying data to cast the
vector<char>tovector<uint8_t>. But the send function accepts aconst char *type here https://github.com/xorbitsai/xoscar/blob/main/cpp/collective/rendezvous/include/utils.hpp#L121:::send(socket, (const char *) currentBytes, bytesToSend, flags)We should unify the data types to char* to avoid copying data.
See issue pybind/pybind11#1807 in pybind11. Copying data here is just for pybind11.
What do these changes do?
Related issue number
Related #22
Check code requirements