Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Verify the downloaded spark tarball on Databricks #11580

Open
razajafri opened this issue Oct 9, 2024 · 1 comment
Open

[BUG] Verify the downloaded spark tarball on Databricks #11580

razajafri opened this issue Oct 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@razajafri
Copy link
Collaborator

Describe the bug
Recently, I tried running the Databricks tests by running jenkins/databricks/test.sh which failed because the spark-3.2.0 tar ball is corrupted during download.

Steps/Code to reproduce bug
Build databricks by running jenkins/databricks/build.sh
Run the tests by running jenkins/databricks/test.sh

Expected behavior
The tests should run or we should bypass the spark 3.2.0 tests if the tarball is corrupt.

@razajafri razajafri added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 9, 2024
@razajafri razajafri changed the title [BUG] Check the tar before running the Databricks tests [BUG] Verify the downloaded spark tarball on Databricks Oct 9, 2024
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Oct 15, 2024
@gerashegalov
Copy link
Collaborator

There are a few places where we download archives manually without following best practice of verifying the checksum. Checking it makes it possible to fail the CI pipeline early with a meaningful error message;

in this case

wget https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz -P /tmp

For this particular release sha512 sum is in an unusual format

https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz.sha512

spark-3.2.0-bin-hadoop3.2.tgz: EBE51A44 9EBD070B E7D35709 31044070 E53C2307
                               6ABAD233 B3C51D45 A7C99326 CF55805E E0D573E6
                               EB7D6A67 CFEF1963 CD77D6DC 07DD2FD7 0FD60DA9
                               D1F79E5E

so it cannot be directly passed to sha512sum -c

but we can see it is the same checksum as in

$ sha512sum -b spark-3.2.0-bin-hadoop3.2.tgz
ebe51a449ebd070be7d3570931044070e53c23076abad233b3c51d45a7c99326cf55805ee0d573e6eb7d6a67cfef1963cd77d6dc07dd2fd70fd60da9d1f79e5e *spark-3.2.0-bin-hadoop3.2.tgz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants