Loading custom datasets #6

zoemeini · 2021-06-02T09:31:31Z

Hello everybody :)

I am working with a custom pipeline for performing link prediction in a graph. I construct this graph through processing of csv data but in the end I obtain an object of class pytorch_geometric.Dataset (the same class as the default ones used in this repo like cora, protein, email...).

I would like to know what part of the code of this repo should I modify to load my custom dataset object for performing link prediction.

Thank you very much!

Lemour-sudo · 2021-07-08T05:05:33Z

You may need to create a folder of your custom dataset in the data/ folder similar to the CiteSeer folder under data/.
Then pass the name of your custom data folder as an argument in the cmd line when calling the train.py script.
For example: if I save my custom dataset files under data/ as: data/Custom-Data, then I can point to this new dataset folder by calling:
python train.py --dataset Custom-Data

lucky6qi · 2021-09-21T21:47:05Z

I also started working on loading custom datasets, however I don't know how I could prepare my data into the files in the format acoording to data/CiteSeer folder. Moreover if I use the python train.py --dataset CiteSeer, I ended up with the error

  File "train.py", line 117, in <module>
    loss = deal.default_loss(inputs, labels, data, thetas=theta_list, train_num=int(X_train.shape[0] *args.train_ratio)*2)
  File "/Users/liuqi7/deal/model.py", line 466, in default_loss
    dists = data.dists[nodes[:,0],nodes[:,1]] 
TypeError: 'NoneType' object is not subscriptable```

lajd · 2021-10-06T06:48:59Z

Hi @lucky6qi, I had the same issue and the solution is to download the dists-1.dat file, as in data/CiteSeer/About dist data.
I created a fork to assist with the setup instructions (see Installation here https://github.com/lajd/DEAL/blob/master/README.md)

basudev-yadav · 2022-05-03T14:53:09Z

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

basudev-yadav · 2022-05-06T07:15:08Z

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

FatemeMirzaeii · 2022-11-17T17:41:29Z

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

same question. have you found the answer?

fatemehkarimi · 2024-07-05T14:56:45Z

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

Yes, it is the shortest path between each pair of nodes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading custom datasets #6

Loading custom datasets #6

zoemeini commented Jun 2, 2021

Lemour-sudo commented Jul 8, 2021

lucky6qi commented Sep 21, 2021 •

edited

Loading

lajd commented Oct 6, 2021

basudev-yadav commented May 3, 2022 •

edited

Loading

basudev-yadav commented May 6, 2022

FatemeMirzaeii commented Nov 17, 2022

fatemehkarimi commented Jul 5, 2024

Loading custom datasets #6

Loading custom datasets #6

Comments

zoemeini commented Jun 2, 2021

Lemour-sudo commented Jul 8, 2021

lucky6qi commented Sep 21, 2021 • edited Loading

lajd commented Oct 6, 2021

basudev-yadav commented May 3, 2022 • edited Loading

basudev-yadav commented May 6, 2022

FatemeMirzaeii commented Nov 17, 2022

fatemehkarimi commented Jul 5, 2024

lucky6qi commented Sep 21, 2021 •

edited

Loading

basudev-yadav commented May 3, 2022 •

edited

Loading