Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--overwrite should only overwrite if the local and remote md5s differ #37

Open
nebfield opened this issue Aug 1, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@nebfield
Copy link
Member

nebfield commented Aug 1, 2024

Current overwrite behaviour is fairly dumb:

def https_download(*, url, out_path, directory, overwrite):
"""Download a file from the PGS Catalog over HTTPS, with automatic retries and
waiting. md5 checksums are automatically validated."""
try:
if Config.FTP_EXCLUSIVE:
logger.warning("HTTPS downloads disabled by Config.FTP_EXCLUSIVE")
https_download.retry.wait = tenacity.wait_none()
https_download.retry.stop = tenacity.stop.stop_after_attempt(1)
raise ScoreDownloadError("HTTPS disabled")
if out_path.exists() and not overwrite:
raise FileExistsError(f"{out_path} already exists")
checksum_path = url + ".md5"
checksum = httpx.get(checksum_path, headers=Config.API_HEADER).text
md5 = hashlib.md5()
with tempfile.NamedTemporaryFile(dir=directory, delete=False) as f:
with httpx.stream("GET", url, headers=Config.API_HEADER) as r:
for data in r.iter_bytes():
f.write(data)
md5.update(data)
if (calc := md5.hexdigest()) != (remote := checksum.split()[0]):
# will attempt to download again (see decorator)
raise ScoreChecksumError(
f"Calculated checksum {calc} doesn't match {remote}"
)
except httpx.UnsupportedProtocol as protocol_exc:
raise ValueError(f"Can't download a local file: {url!r}") from protocol_exc
except httpx.RequestError as download_exc:
raise ScoreDownloadError("HTTPS download failed") from download_exc
else:
# no exceptions thrown, move the temporary file to the final output path
os.rename(f.name, out_path)
logger.info(f"HTTPS download OK, {out_path} checksum validation passed")

Local and remote md5s can differ if there's been an update to the scoring file (e.g. trait update because of an upstream ontology change)

@nebfield nebfield added the enhancement New feature or request label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant