Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached files need python version information in filenames #1551

Closed
petrelharp opened this issue Feb 23, 2024 · 15 comments
Closed

cached files need python version information in filenames #1551

petrelharp opened this issue Feb 23, 2024 · 15 comments

Comments

@petrelharp
Copy link
Contributor

petrelharp commented Feb 23, 2024

Over on slack, Alan Izararras pointed out that running

species = stdpopsim.get_species("HomSap")
dfe = species.get_dfe("Gamma_K17")
model = species.get_demographic_model("OutOfAfrica_3G09")
contig = species.get_contig("chr22", genetic_map="HapMapII_GRCh37", mutation_rate=model.mutation_rate)
samples = {"YRI":100}
exons = species.get_annotations("ensembl_havana_104_exons")
exon_intervals = exons.get_chromosome_annotations("chr21")
contig.add_dfe(intervals=exon_intervals, DFE=dfe)

with version 0.2.0 now errors with

ValueError: Expected SHA256=b9df30b3a37cdd26ec625fd80968ff2e412810c045a11dddc458dc606c702c96, but downloaded file has41160d5ad616d25be13e95fb44c523e2d82dffc942b835b979fdc7c4c01f6d8c.

I can confirm this.

So, currently doing pip install stdpopsim and running that code does not work.

I'm pretty sure that the problem is that we've replaced the files on amazon with new ones that have different hashes; thus breaking previous versions of the software.

I think the solution is to (a) put back the old files, and (b) add a naming convention to the files so we don't clobber previous ones.

Some suggestions for the naming convention (first suggestion best I think):

  1. For each file, just append a file-specific version number, like filename_v1.tar.gz. Then we just need to update the version number whenever we update the file, no fragile programming required (like in 1 and 2 below).

  2. directory structure on amazon like

0.1.0/
0.1.1/
...
0.2.0/
latest/

and the files within each of these would be symlinks to the previous one unless they had been updated. This would let us keep the actual file names the same (maybe bad).

  1. similar, but have each file present in various versions like filename_%version%.tar.gz, again with most of these being symlinks.
@petrelharp
Copy link
Contributor Author

The change in checksums went in in #1537. @chriscrsmith do you happen to have the old annotations?

@petrelharp
Copy link
Contributor Author

Hm so this means we could recreate the old files by using the pre- #1522 maintenance script.

@nspope
Copy link
Collaborator

nspope commented Feb 23, 2024

I've got a copy of the old annotation here, I think:
cache-stdpopsim-b9df30.tar.gz

@petrelharp
Copy link
Contributor Author

Note: the cache directory is what's returned by

import appdirs
appdirs.user_cache_dir("stdpopsim")

@petrelharp
Copy link
Contributor Author

@nspope 's file has all the HomSap/ensembl_havana_104_exons files, but we still need HomSap CDS and both exons and CDS for AraTha, and DroMel.

@lntran26
Copy link
Member

lntran26 commented Mar 5, 2024

I also ran into this error while trying to run the human production config for the analysis2 paper. I followed the Development installation guide here but still have the same error. This is the version I'm on when checking with print(stdpopsim.__version__): 0.2.1.dev58+gd923127. Not sure how to/if I should get on the 0.2.1.dev99+gfbb94c2 version @petrelharp mentioned on Slack.

@petrelharp
Copy link
Contributor Author

petrelharp commented Mar 5, 2024

We need:

235105850a365f7f171c459d4c5ee50483e0c17e3b2b0232412d22addca9915f
c1e1c17a0bf3591e91a4cf85a0ad964d1e9205cc2788c40f855870c589cacca7
f09b4684505c7c8c86c7739d632c79927b8d329fd60fd55ea3f610b944bb5856
b993c8fc997e1c7ecdad626b7eeceae724cc0e0e477d8ab2f186866a6a0def15
b9df30b3a37cdd26ec625fd80968ff2e412810c045a11dddc458dc606c702c96
237801b39642b91e733e155dc62960c0b097d3c144eea408327e9dc5f1ac84ae

@lntran26
Copy link
Member

lntran26 commented Mar 5, 2024

Update: after reinstalling the github Development version using pip install . as mentioned in #1552 I am now on 0.2.1.dev99+gfbb94c2 and no longer have this error. Checking the cache directory using cat $CACHEDIR/annotations/*/*/*.sha256 I have two (the second one was the old one I had prior to reinstallation):

0562bb9fb7d74625a52cd32360066bab0fe1188aaf5022707299fb69eb2b930d
b9df30b3a37cdd26ec625fd80968ff2e412810c045a11dddc458dc606c702c96

which may be why I have this warning
/home/u30/lnt/micromamba/envs/analysis2/lib/python3.10/site-packages/stdpopsim/cache.py:169: UserWarning: Error occured renaming map directory. Are multiple processes downloading this map at the same time?
when running the simulation, but it still successfully ran.

@petrelharp
Copy link
Contributor Author

Thanks - that all seems as expected; perhaps the warning could be more informative, though.

@chriscrsmith
Copy link
Contributor

chriscrsmith commented Mar 6, 2024

I have AraTha and DroMel

235105850a365f7f171c459d4c5ee50483e0c17e3b2b0232412d22addca9915f
c1e1c17a0bf3591e91a4cf85a0ad964d1e9205cc2788c40f855870c589cacca7
f09b4684505c7c8c86c7739d632c79927b8d329fd60fd55ea3f610b944bb5856
b993c8fc997e1c7ecdad626b7eeceae724cc0e0e477d8ab2f186866a6a0def15

edit:
and the other two

@chriscrsmith
Copy link
Contributor

@petrelharp
Copy link
Contributor Author

petrelharp commented Mar 6, 2024

@petrelharp
Copy link
Contributor Author

Update: we have uploaded the files above to AWS; so now 0.2.0 should work. However, this means that now github HEAD is broken; next on the list is to fix that.

@petrelharp
Copy link
Contributor Author

Okay, so I'm proposing doing

For each file, just append a file-specific version number, like filename_v1.tar.gz. Then we just need to update the version number whenever we update the file, no fragile programming required (like in 1 and 2 below).

If we do this, then the maintenance/annotation_maint.py script should bump the version number.

@petrelharp
Copy link
Contributor Author

Closed in #1553.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants