Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should_update without internet connection #888

Open
drodarie opened this issue Sep 24, 2024 · 3 comments
Open

should_update without internet connection #888

drodarie opened this issue Sep 24, 2024 · 3 comments
Labels
question Further information is requested

Comments

@drodarie
Copy link
Contributor

drodarie commented Sep 24, 2024

I have a file (1.json from the Allen website) that is stored locally on my machine thanks to the cache system.
Yet, every time I need to fetch the file with bsb, it checks the meta of the file which means it needs to fetch its equivalent from internet to compare.

The question is: should we be able to bypass this test if the internet connection is down but the file exists locally?
This became an issue on clusters where the internet connection is closed once you launch a job.

@drodarie drodarie added the question Further information is requested label Sep 24, 2024
@Helveg
Copy link
Contributor

Helveg commented Sep 28, 2024

Could you post the relevant code for me? I'd like to check what you mean by "checks the meta of the file", the best way to deal with this is probably by setting a shorter timeout on the connection attempt, and to fall back as gracefully as possible? Indeed, if we have a cached file we could after falling back continue with the cached version.

I'm not sure if the Allen partition can operate without access to such a file at all? So I'm not sure what we could do on a machine without an internet connection; we could let the user provide the file manually? We make an OfflineAllenPartition ;p

@drodarie
Copy link
Contributor Author

drodarie commented Oct 3, 2024

Could you post the relevant code for me?

The function that calls for the json file is AllenStructure._dl_structure_ontology from bsb-core/bsb/topology/partition.py
In short this leverages a FileDependency which, when you need to load the file content, checks the meta of the file to see if it needs to be updated.

I'm not sure if the Allen partition can operate without access to such a file at all?

Indeed no. It is a requirement. The precise situation is that we are launching a job on a cluster node which has no access to internet. So, we downloaded in advance the file with bsb (so that it is stored properly in the cache folder) and it should not need to be updated.

we could let the user provide the file manually?

Yes I think this is the safest option but just out of curiosity, I was wondering why the code was failing despite having the file locally (in the bsb cache folder).

@Helveg
Copy link
Contributor

Helveg commented Oct 4, 2024

I wouldn't know without taking a deeper look at the code for which I don't have the time :( There might be a couple of causes. I'm assuming that the cached files get a hashed filename? If so, the hash might differ between machines, or even worse, between Python processes (which would mean the cache kind of sucks). Another cause might be that the whole file dependency code is probably a lot too branchy and complex, and that in some of the branches the cache isn't used?

In any case, the true solution to get out of the woods with complicated branchy stuff is to add unit tests that can spy whether the cached file is hit, and can assert that the code doesn't fetch the remote file again if it has it cached.

You can add a config attr to provide the file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants