Skip to content

FEAT: Support pygloo in collective communication#38

Merged
mergify[bot] merged 18 commits intoxorbitsai:mainfrom
YibinLiu666:pygloo
Jul 13, 2023
Merged

FEAT: Support pygloo in collective communication#38
mergify[bot] merged 18 commits intoxorbitsai:mainfrom
YibinLiu666:pygloo

Conversation

@YibinLiu666
Copy link
Copy Markdown
Contributor

@YibinLiu666 YibinLiu666 commented Jul 5, 2023

What do these changes do?

Related issue number

Related #22

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass

@XprobeBot XprobeBot added this to the v0.0.6 milestone Jul 5, 2023
@ChengjieLi28 ChengjieLi28 changed the title FEATURE: Support pygloo in collective communication FEAT: Support pygloo in collective communication Jul 5, 2023
@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 5, 2023

Codecov Report

Merging #38 (02869a8) into main (892e1c5) will decrease coverage by 0.07%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main      #38      +/-   ##
==========================================
- Coverage   93.85%   93.78%   -0.07%     
==========================================
  Files          43       42       -1     
  Lines        3399     3361      -38     
  Branches      675      672       -3     
==========================================
- Hits         3190     3152      -38     
  Misses        138      138              
  Partials       71       71              
Flag Coverage Δ
unittests 93.63% <ø> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@XprobeBot XprobeBot modified the milestones: v0.0.6, v0.0.9 Jul 7, 2023
ChengjieLi28
ChengjieLi28 previously approved these changes Jul 7, 2023
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt
Comment thread CMakeLists.txt
Comment thread cpp/collective/gloo/src/rendezvous.cc Outdated
Comment thread python/xoscar/collective/gloo/xoscar_pygloo.pyi
Comment thread python/xoscar/collective/gloo/xoscar_pygloo.pyi
Comment thread python/xoscar/collective/test/test_pygloo_tcp_store.py
Comment thread python/xoscar/collective/test/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/test/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/gloo/test/__init__.py Outdated
@ChengjieLi28
Copy link
Copy Markdown
Contributor

Also, we need to support all_to_all bind.

@YibinLiu666
Copy link
Copy Markdown
Contributor Author

Also, we need to support all_to_all bind.

Implemented

Comment thread CMakeLists.txt Outdated
Comment thread python/xoscar/tests/core.py Outdated
Comment thread python/xoscar/tests/core.py Outdated
Copy link
Copy Markdown
Contributor

@codingl2k1 codingl2k1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep only one Store type and wrap the TCPStore to PrefixStore directly.

Comment thread CMakeLists.txt
Comment thread CMakeLists.txt
Comment thread cpp/collective/gloo/src/rendezvous.cc
Comment thread python/xoscar/collective/tests/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/tests/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/tests/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/tests/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/tests/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/tests/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/tests/test_pygloo_tcp_store.py Outdated
Comment thread python/xoscar/collective/gloo/xoscar_pygloo.pyi Outdated
Comment thread cpp/CMakeLists.txt Outdated
Comment thread cpp/collective/rendezvous/src/tcp_store.cpp Outdated
Comment thread cpp/collective/rendezvous/src/tcp_store.cpp Outdated
Comment thread cpp/collective/rendezvous/src/tcp_store.cpp Outdated
Comment thread python/xoscar/collective/xoscar_pygloo.pyi
Comment thread python/xoscar/collective/tests/test_pygloo.py
Comment thread python/xoscar/collective/tests/test_pygloo.py Outdated
Comment thread python/xoscar/collective/tests/test_pygloo.py Outdated

std::vector<T> inputbuf(size);

memcpy(inputbuf.data(), input_ptr, size * sizeof(T));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mem copy required by gloo?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gloo ReduceScatterHalvingDoubling algorithm accepts a vector with just one ptr: pytorch/gloo#303
I am not sure if we can remove this copy in the future.

bool TCPStore::hasExtendedApi() const { return true; }

void TCPStore::set(const std::string &key, const std::vector<char> &data) {
std::vector<uint8_t> dataSet(data.begin(), data.end());
Copy link
Copy Markdown
Contributor

@codingl2k1 codingl2k1 Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here copying data to cast the vector<char> to vector<uint8_t>. But the send function accepts a const char * type here https://github.com/xorbitsai/xoscar/blob/main/cpp/collective/rendezvous/include/utils.hpp#L121:

::send(socket, (const char *) currentBytes, bytesToSend, flags)

We should unify the data types to char* to avoid copying data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here copying data to cast the vector<char> to vector<uint8_t>. But the send function accepts a const char * type here https://github.com/xorbitsai/xoscar/blob/main/cpp/collective/rendezvous/include/utils.hpp#L121:

::send(socket, (const char *) currentBytes, bytesToSend, flags)

We should unify the data types to char* to avoid copying data.

See issue pybind/pybind11#1807 in pybind11. Copying data here is just for pybind11.

Copy link
Copy Markdown
Contributor

@codingl2k1 codingl2k1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mergify mergify Bot merged commit 7db36a2 into xorbitsai:main Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants