Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maybe a potential bug of neighbor sampling of distributed dgl heterogeneous graph #7473

Open
yfismine opened this issue Jun 23, 2024 · 7 comments

Comments

@yfismine
Copy link

🐛 Bug

To Reproduce

According to my understanding, this may be a potential bug in the distributed dgl. There is such a code for neighbor sampling in the CSRRowWisePerEtypePick function in the rowwise_pick.h file to determine the type of an edge.
image
This function works normally when all edges are the inner edges of this slice, but for the outer edges, it is possible to trigger the following assertion error.
Let me give you an example. Now local_etype_offset is [0,5,10] and fanout is [1,1]. If the point I sample is the internal point of this partition, but the only edge that exists at this point is the external edge, because this edge is the external edge, its eid is likely to be greater than 10. At this time, we calculate that the heterogenized_etype of this outer edge is 2, but when we enter the following assertion, we will prompt the error prompt of et [et _ idx [len-1]] < num _ etypes (2vs2) etypevalues exceeding the number of fanouts.

Environment

  • DGL Version (e.g., 1.0): 2.1
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): 2.3.0
  • OS (e.g., Linux): Linux
  • How you installed DGL (conda, pip, source): conda
  • Python version: 3.12.3
Copy link

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

@Rhett-Ying
Copy link
Collaborator

Rhett-Ying commented Jul 25, 2024

I don't think it's a bug.

During partition, we make sure all the edges of inner_node are partitioned into current partition. and these edges are called inner edges.

@yfismine
Copy link
Author

I don't think it's a bug.

During partition, we make sure all the edges of inner_node are partitioned into current partition. and these edges are called inner edges.

It means that both the edge in and the edge out of the inner_node are on the same partition, and if there are features there, are they all stored in multiple partitions?

@yfismine
Copy link
Author

yfismine commented Aug 4, 2024

I used the provided pairtion_graph function to test the segmented subgraph. It is very easy to find that some points are inner_node, but not all the edges directly connected with it are inner_edge. I found that if this point is inner_node, all its in_edges seem to be all inner_edge. But comparing out_edge is not necessarily true. I think your description may be inaccurate. The reason why this has not been wrong is because we usually use Incoming Edge mode when defining neighbor samplers, so there has been no error. @Rhett-Ying

@Rhett-Ying
Copy link
Collaborator

I don't think it's a bug.
During partition, we make sure all the edges of inner_node are partitioned into current partition. and these edges are called inner edges.

It means that both the edge in and the edge out of the inner_node are on the same partition, and if there are features there, are they all stored in multiple partitions?

yes.

@Rhett-Ying
Copy link
Collaborator

Let me clarify more.

  1. inner nodes means they belong to current partition. These nodes are inner_nodes=True ones. Node features are partitioned and saved according to inner_nodes.
  2. any in-edges of inner nodes are marked as inner_edges. These edges are inner_edge=True ones. Edge features are partitioned and saved according to inner_edges.
  3. As we save all in-edges of inner_nodes, we may include some nodes that don't belong on current partition. These nodes are inner_nodes=False. Node feature of these nodes are NOT saved in the feature data of current partition.
  4. In order to obtain the out_degree of inner nodes, any out-edges of inner_nodes are also saved in current partition. These edges are inner_edge=False ones. Their edge features are not saved in current partition.

Copy link

github-actions bot commented Sep 8, 2024

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants