-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support opening from URLs #66
Comments
If this is implemented, perhaps we should put a note in the |
To implement this we'd need to
It'll be fiddly, unfortunately. |
To increase the fiddlyness, it would be really helpful, I think, to be able to show progress when downloading, if at all possible. Even if we don't know the file size beforehand, something that tells the user that the session hasn't just stalled is pretty useful for teaching purposes. |
That's surely feature creep - why not put in a bash cell that does the download to a local file using curl? |
ISWYM about feature creep. But how many tskit users (not devs) know about curl and bash? And do we even want them to know about that before they get started? We provide progress bars for tsinfer to give feedback too. I guess this could be a Zarr thing anyway. Presumably remote access to data, and feedback about time to complete is on their agenda? |
They don't need to for your use case though right, either way it's just a cell in the notebook that they execute which leads to you having a TreeSequence object loaded. |
Yep, at the moment I'm just doing this in a cell: import urllib.request
from tqdm import tqdm
import tszip
class DownloadProgressBar(tqdm):
def update_to(self, b=1, bsize=1, tsize=None):
if tsize is not None:
self.total = tsize
self.update(b * bsize - self.n)
url = "https://zenodo.org/record/5512994/files/hgdp_tgp_sgdp_high_cov_ancients_chr2_q.dated.trees.tsz"
with DownloadProgressBar(unit='B', unit_scale=True,
miniters=1, desc=url.split('/')[-1]) as t:
temporary_filename, _ = urllib.request.urlretrieve(url, reporthook=t.update_to)
ts_2q = tszip.decompress(temporary_filename)
urllib.request.urlcleanup() # remove temporary_filename But it would be much cleaner to wrap that somehow
|
I just tried out by bash magic idea and it'd didn't work because there's no "live" update from the cell, and so you only get the download progress at the very end. So you would have to do this via a python package of some sort. |
The tqdm code above works a treat. But it's still a bit verbose, and users might baulk at having to understand it. It's not that satisfying to say "just paste this code and ignore how it works". So anything that would help wrap this into a more terse and comprehensible syntax would be good, I think. Perhaps @benjeffery has a good suggestion (he usually does!). Personally I don't think it's too bad to have tqdm as a tszip dependency. You could imagine, for instance, defining something like the DownloadProgressBar class as a tszip helper:
is already a lot cleaner IMO. But maybe there is an even terser way to do it? |
It makes no sense to add a general progress bar UI to a package that's for compressing tskit tree sequences. What you're looking for is a python package that does a download with an integrated progress bar (which I agree would be very useful): import yanspackage
url = "https://zenodo.org/record/5512994/files/hgdp_tgp_sgdp_high_cov_ancients_chr2_q.dated.trees.tsz"
filename = yanspackage.download(url, progress="notebook") |
OK, but I'm thinking of (FWIW for learning stuff, I rather dislike having to download files separately, before doing analysis, then fiddling around with coding where the files are stored, etc. I would much prefer it to appear as if I have streamed the download directly into the variables in my python session, and not have to think about clearing up disk space afterwards, or dealing with tmp directories. Perhaps I'm unusual like that, though?) |
Discussion at tskit-dev/tskit#1566 (comment)
The text was updated successfully, but these errors were encountered: