Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL 8 x86_64 repository wrong package file size #2

Closed
pjgeorg opened this issue Nov 11, 2021 · 3 comments
Closed

RHEL 8 x86_64 repository wrong package file size #2

pjgeorg opened this issue Nov 11, 2021 · 3 comments
Assignees

Comments

@pjgeorg
Copy link

pjgeorg commented Nov 11, 2021

Starting recently (exact point in time is unknown) syncing the rhel8/x86_64 repository (http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64) fails:

Error message:

[MIRROR] cuda-compat-10-1-418.226.00-1.x86_64.rpm: Interrupted by header callback: Server reports Content-Length: 5713804 but expected size is: 5713800
[FAILED] cuda-compat-10-1-418.226.00-1.x86_64.rpm: No more mirrors to try - All mirrors were already tried without success

Downloading the file its size is indeed 5713804.
Checking the files signature and MD5:

rpm -Kv cuda-compat-10-1-418.226.00-1.x86_64.rpm 
cuda-compat-10-1-418.226.00-1.x86_64.rpm:
    Header V4 RSA/SHA512 Signature, key ID 7fa2af80: OK
    Header SHA256 digest: OK
    Header SHA1 digest: OK
    Payload SHA256 digest: OK
    V4 RSA/SHA512 Signature, key ID 7fa2af80: OK
    MD5 digest: OK

The same happens trying to download or install this package using dnf:

dnf download cuda-compat-10-1-418.226.00-1.x86_64
Last metadata expiration check: 0:00:08 ago on 2021-11-11T14:08:02 CET.
[MIRROR] cuda-compat-10-1-418.226.00-1.x86_64.rpm: Interrupted by header callback: Server reports Content-Length: 5713804 but expected size is: 5713800
[MIRROR] cuda-compat-10-1-418.226.00-1.x86_64.rpm: Interrupted by header callback: Server reports Content-Length: 5713804 but expected size is: 5713800
[MIRROR] cuda-compat-10-1-418.226.00-1.x86_64.rpm: Interrupted by header callback: Server reports Content-Length: 5713804 but expected size is: 5713800
[MIRROR] cuda-compat-10-1-418.226.00-1.x86_64.rpm: Interrupted by header callback: Server reports Content-Length: 5713804 but expected size is: 5713800
[FAILED] cuda-compat-10-1-418.226.00-1.x86_64.rpm: No more mirrors to try - All mirrors were already tried without success

I assume there is some error with the repodata?

@pjgeorg pjgeorg changed the title RHEL 8 x86_64 repository wrong size RHEL 8 x86_64 repository wrong package file size Nov 11, 2021
@kmittman
Copy link
Collaborator

Hi @pjgeorg I'm taking a look. Preliminary check does seem to a metadata and/or CDN issue, so moving from yum-packaging-precompiled-kmod to cuda-repo-management git repository for tracking.

@kmittman kmittman transferred this issue from NVIDIA/yum-packaging-precompiled-kmod Nov 11, 2021
@kmittman
Copy link
Collaborator

The root cause was some incorrect metadata introduced last month during the 418.226.00 posting.

rhel8/x86_64/cuda-compat-10-1-418.226.00-1.x86_64.rpm
rhel8/x86_64/cuda-drivers-418.226.00-1.x86_64.rpm
rhel8/x86_64/kmod-nvidia-latest-dkms-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-driver-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-driver-NVML-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-driver-NvFBCOpenGL-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-driver-cuda-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-driver-cuda-libs-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-driver-devel-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-driver-libs-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-kmod-common-418.226.00-1.el8.noarch.rpm
rhel8/x86_64/nvidia-libXNVCtrl-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-libXNVCtrl-devel-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-modprobe-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-persistenced-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-settings-418.226.00-1.el8.x86_64.rpm
rhel8/x86_64/nvidia-xconfig-418.226.00-1.el8.x86_64.rpm

We use the --update flag with createrepo_c generate the metadata which continued to propagate this issue in subsequent releases.

I have re-generated the RPM metadata from scratch for the rhel8/x86_64 repository and validated the rest of the CUDA repos.

@kmittman kmittman self-assigned this Nov 19, 2021
kmittman added a commit that referenced this issue Nov 19, 2021
 - Flattens package sections into one-liners for grepping
 - Compares file-size and SHA256 checksums
 - RPM only: scans through repodata history to pin-point when mismatch occurred (issue #2)

Signed-off-by: Kevin Mittman <[email protected]>
@kmittman
Copy link
Collaborator

Thank you for reporting this issue @pjgeorg please let me know if see any similar mismatches in the future, closing as resolved.

  • ✔️ $ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda- rhel8.repo
  • ✔️ $ sudo dnf install cuda-compat-10-1
  • ✔️ $ sudo dnf module install nvidia-driver:418

This was not the usual issue seen in #1, so I have added a new script repo-validate.sh to check for this going forward.

$ ./repo-validate.sh --mirror=/path/to/snapshot --distro=rhel8 --arch=x86_64 
[...]
cuda-compat-10-1-418.226.00-1.x86_64.rpm [5713804] [b6556b47df]
cuda-drivers-418.226.00-1.x86_64.rpm [7156] [066f1bbb7e]
kmod-nvidia-418.226.00-4.18.0-305.19.1-418.226.00-3.el8_4.x86_64.rpm [12744460] [5578677cfa]
kmod-nvidia-418.226.00-4.18.0-305.25.1-418.226.00-3.el8_4.x86_64.rpm [12744408] [f0e1f9e42d]
kmod-nvidia-418.226.00-4.18.0-348-418.226.00-3.el8.x86_64.rpm [12744232] [94a2644460]
kmod-nvidia-418.226.00-4.18.0-348.2.1-418.226.00-3.el8_5.x86_64.rpm [12744480] [b68c1c6201]
kmod-nvidia-latest-dkms-418.226.00-1.el8.x86_64.rpm [12394276] [ad19aa551b]
nvidia-driver-418.226.00-1.el8.x86_64.rpm [2590320] [803f8b814b]
nvidia-driver-NVML-418.226.00-1.el8.x86_64.rpm [474164] [6a2f622370]
nvidia-driver-NvFBCOpenGL-418.226.00-1.el8.x86_64.rpm [112140] [4d8b997c73]
nvidia-driver-cuda-418.226.00-1.el8.x86_64.rpm [309724] [a165bfb9ec]
nvidia-driver-cuda-libs-418.226.00-1.el8.x86_64.rpm [24639824] [752f329c15]
nvidia-driver-devel-418.226.00-1.el8.x86_64.rpm [12508] [5067b47fdb]
nvidia-driver-libs-418.226.00-1.el8.x86_64.rpm [35801420] [1274ad473b]
nvidia-kmod-common-418.226.00-1.el8.noarch.rpm [10272] [35a7931edd]
nvidia-libXNVCtrl-418.226.00-1.el8.x86_64.rpm [51968] [208d000da8]
nvidia-libXNVCtrl-devel-418.226.00-1.el8.x86_64.rpm [55280] [942d4865b7]
nvidia-modprobe-418.226.00-1.el8.x86_64.rpm [74256] [ba3bc5b729]
nvidia-persistenced-418.226.00-1.el8.x86_64.rpm [106876] [f669939931]
nvidia-settings-418.226.00-1.el8.x86_64.rpm [3645528] [5a9a8f551d]
nvidia-xconfig-418.226.00-1.el8.x86_64.rpm [269540] [577e2cee84]
[...]

Everything matches now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants