You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just wanted to point out that the data loader in this implementation seems to be a lot less efficient than it could have been. Right now, the code writes each encoded image into a separate .npy file and during training loads each file in a batch separately. That's a lot of inefficient file I/O. You could have just saved all pre-extracted features in a single array/tensor and loaded a single file into RAM (or even into GPU RAM) once before starting training. The entire ImageNet takes up only 5 GB of memory if you store it in uint8 in this way, e.g.: https://huggingface.co/datasets/cloneofsimo/imagenet.int8.
The text was updated successfully, but these errors were encountered:
No, I haven't tried it myself yet, but given that the compression seems to be near-lossless and given the qualitative reconstruction results, I would not expect any noticeable performance degradation. Note that uint8 would only be used for the input, the rest of the model would still use bf16/fp32, so it should not lead to any training stability issues.
I just wanted to point out that the data loader in this implementation seems to be a lot less efficient than it could have been. Right now, the code writes each encoded image into a separate
.npy
file and during training loads each file in a batch separately. That's a lot of inefficient file I/O. You could have just saved all pre-extracted features in a single array/tensor and loaded a single file into RAM (or even into GPU RAM) once before starting training. The entire ImageNet takes up only 5 GB of memory if you store it inuint8
in this way, e.g.: https://huggingface.co/datasets/cloneofsimo/imagenet.int8.The text was updated successfully, but these errors were encountered: