Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not read and pass on the results #30

Open
Nightrord opened this issue Aug 21, 2016 · 13 comments
Open

Can not read and pass on the results #30

Nightrord opened this issue Aug 21, 2016 · 13 comments

Comments

@Nightrord
Copy link

I ran python bh_tsne on a 95 * 745544 matrix, and here is my command ./bhtsne.py -i ~/Dropbox/github/data/lan_uid_matrix.txt -o ~/Dropbox/github/data/lan_uid_coordinate.txt -p 5 -d 2 -t 1 -v

but it shows the error as follows:
Error: could not open data file.
Traceback (most recent call last):
File "./bhtsne.py", line 233, in
exit(main(argv))
File "./bhtsne.py", line 224, in main
verbose=argp.verbose, initial_dims=argp.initial_dims, use_pca=argp.use_pca, max_iter=argp.max_iter):
File "./bhtsne.py", line 211, in run_bh_tsne
for result in bh_tsne(tmp_dir_path, verbose):
File "./bhtsne.py", line 164, in bh_tsne
with open(path_join(workdir, 'result.dat'), 'rb') as output_file:
IOError: [Errno 2] No such file or directory: '/var/folders/92/8ty0c6392m773r5tbp4s9gy80000gp/T/tmpakdFT0/result.dat'

I don't know why it can not find the result.dat. Could you help me solve it?

Thanks in advance

@gpapadop79
Copy link

result.dat is created in a temporary folder. In your case it tries to create it in /var.

Check if your user has permission to create folders in /var

@Nightrord
Copy link
Author

@gpapadop79 thanks for the infos. Because the matrix as the input is large(95 rows and 745544 columns), if there is no enough memory, should it cause this problem? As the folder is a temporary one, how to check the permission to create folders?

Thanks in advance

@lvdmaaten
Copy link
Owner

Is the t-SNE algorithm itself actually being run? Like do you see a loss being printed every 50 iterations or something like that? The result file should actually be really small (95x2 matrix), so I would be surprised if this were an OOM problem.

To check permissions, you can do something like ls -la /var/folders/92/8ty0c6392m773r5tbp4s9gy80000gp/T/tmpakdFT0 and confirm that there are w for the three sets of users? Alternatively, try running this with sudo to see if that helps?

@rohit-gupta Perhaps it makes sense to have an input option to specify the folder for intermediate results? I am not a Python user, so I am not familiar with the exact behavior of mkdtemp().

@Nightrord
Copy link
Author

The t-SNE algorithm doesn't run and it directly show the error: IOError: [Errno 2] No such file or directory: '/var/folders/92/8ty0c6392m773r5tbp4s9gy80000gp/T/tmpllxm8j/result.dat'. I think the problem is the input matrix so large, I will try it on some other machine.

@lvdmaaten
Copy link
Owner

Can you please copy-paste the full output? Does the data.dat file get written by the Python wrapper?

@Nightrord
Copy link
Author

Here is the full output
~/Projects/bhtsne (master*) $ python bhtsne.py -i ~/Projects/tsne_python/lan_uid_matrix_tsne1.txt -o ~/Dropbox/github/data/lan_uid_coordinate.txt -p 5 -d 2 -t 1 -v

Error: could not open data file.
Traceback (most recent call last):
File "bhtsne.py", line 233, in
exit(main(argv))
File "bhtsne.py", line 224, in main
verbose=argp.verbose, initial_dims=argp.initial_dims, use_pca=argp.use_pca, max_iter=argp.max_iter):
File "bhtsne.py", line 211, in run_bh_tsne
for result in bh_tsne(tmp_dir_path, verbose):
File "bhtsne.py", line 164, in bh_tsne
with open(path_join(workdir, 'result.dat'), 'rb') as output_file:
IOError: [Errno 2] No such file or directory: '/var/folders/92/8ty0c6392m773r5tbp4s9gy80000gp/T/tmpllxm8j/result.dat'

The problem is that it can not find the intermediate file result.data.

@lvdmaaten
Copy link
Owner

The way the code works is: (1) Python wrapper writes data.dat, (2) binary runs t-SNE on data.dat, (3) binary writes results into result.dat, and (4) Python wrapper reads result.dat. Therefore, we first need to determine in which step things go wrong. The output suggests the problem is actually in step 1. Can you confirm by checking whether or not data.dat gets written?

@Nightrord
Copy link
Author

The file data.dat has been written. But there is no result.dat. I think the problem happens on step 3.

@Nightrord
Copy link
Author

Nightrord commented Aug 22, 2016

I have run the data on another computer and I meet the same problem. Here is the output:

Traceback (most recent call last):
File "bhtsne.py", line 233, in
exit(main(argv))
File "bhtsne.py", line 224, in main
verbose=argp.verbose, initial_dims=argp.initial_dims, use_pca=argp.use_pca, max_iter=argp.max_iter):
File "bhtsne.py", line 206, in run_bh_tsne
init_bh_tsne(input_file, tmp_dir_path, no_dims=no_dims, perplexity=perplexity, theta=theta, randseed=randseed,verbose=verbose, initial_dims=initial_dims, use_pca=use_pca, max_iter=max_iter)
File "bhtsne.py", line 118, in init_bh_tsne
cov_x = np.dot(np.transpose(samples), samples)
MemoryError
Error: could not open data file.
Traceback (most recent call last):
File "bhtsne.py", line 233, in
exit(main(argv))
File "bhtsne.py", line 224, in main
verbose=argp.verbose, initial_dims=argp.initial_dims, use_pca=argp.use_pca, max_iter=argp.max_iter):
File "bhtsne.py", line 211, in run_bh_tsne
for result in bh_tsne(tmp_dir_path, verbose):
File "bhtsne.py", line 164, in bh_tsne
with open(path_join(workdir, 'result.dat'), 'rb') as output_file:
IOError: [Errno 2] No such file or directory: '/tmp/tmpiXA4u_/result.dat'

It still cannot find the result.dat and it also mentions MemoryError.
And this time, the data.dat file has not been written.
I think the problem is Python wrapper does not writer data.dat.

@EMCP
Copy link

EMCP commented Jun 15, 2017

@lvdmaaten , just wanted to ask.. for those of us using the wrapper from within another python program.. we do not expect to see a data.dat file.. since the data is being fed via the call to run_bh_tsne correct?

@lvdmaaten
Copy link
Owner

The Python wrapper is writing a data.dat file, and then calling the bh_tsne binary. The binary writes the results in a results.dat file, which the wrapper reads in. Afterwards, the wrapper deletes both .dat files. So you would expect to see a data.dat file whilst the binary is running.

The whole thing is pretty clunky... I've been meaning to change this to a FFI call, but I haven't got around to doing that yet.

@EMCP
Copy link

EMCP commented Jun 15, 2017

Understood. I guess I'm not seeing a data.dat then.. but it's likely related to the numppy crash I'm experiencing in OSX ...

Edit : After installing openBLAS and compiling numpy, wrapper functions (except inside of jupyter notebook, but that's an understood limitation I think.)

one last thing @lvdmaaten FFI means https://cffi.readthedocs.io/en/latest/overview.html ?

@lvdmaaten
Copy link
Owner

FFI = Foreign function interface

I think there exist several FFIs for Python; I've been using ctypes in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants