Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading custom datasets #6

Open
zoemeini opened this issue Jun 2, 2021 · 7 comments
Open

Loading custom datasets #6

zoemeini opened this issue Jun 2, 2021 · 7 comments

Comments

@zoemeini
Copy link

zoemeini commented Jun 2, 2021

Hello everybody :)

I am working with a custom pipeline for performing link prediction in a graph. I construct this graph through processing of csv data but in the end I obtain an object of class pytorch_geometric.Dataset (the same class as the default ones used in this repo like cora, protein, email...).

I would like to know what part of the code of this repo should I modify to load my custom dataset object for performing link prediction.

Thank you very much!

@Lemour-sudo
Copy link

You may need to create a folder of your custom dataset in the data/ folder similar to the CiteSeer folder under data/.
Then pass the name of your custom data folder as an argument in the cmd line when calling the train.py script.
For example: if I save my custom dataset files under data/ as: data/Custom-Data, then I can point to this new dataset folder by calling:
python train.py --dataset Custom-Data

@lucky6qi
Copy link

lucky6qi commented Sep 21, 2021

I also started working on loading custom datasets, however I don't know how I could prepare my data into the files in the format acoording to data/CiteSeer folder. Moreover if I use the python train.py --dataset CiteSeer, I ended up with the error

  File "train.py", line 117, in <module>
    loss = deal.default_loss(inputs, labels, data, thetas=theta_list, train_num=int(X_train.shape[0] *args.train_ratio)*2)
  File "/Users/liuqi7/deal/model.py", line 466, in default_loss
    dists = data.dists[nodes[:,0],nodes[:,1]] 
TypeError: 'NoneType' object is not subscriptable```

@lajd
Copy link

lajd commented Oct 6, 2021

Hi @lucky6qi, I had the same issue and the solution is to download the dists-1.dat file, as in data/CiteSeer/About dist data.
I created a fork to assist with the setup instructions (see Installation here https://github.com/lajd/DEAL/blob/master/README.md)

@basudev-yadav
Copy link

basudev-yadav commented May 3, 2022

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

@basudev-yadav
Copy link

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

@FatemeMirzaeii
Copy link

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

same question. have you found the answer?

@fatemehkarimi
Copy link

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

Yes, it is the shortest path between each pair of nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants