Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chap 15, pg 513 : ModuleNotFoundError: No module named 'torchdata.datapipes' #199

Open
Emmanuel-Ibekwe opened this issue Dec 24, 2024 · 3 comments

Comments

@Emmanuel-Ibekwe
Copy link

Emmanuel-Ibekwe commented Dec 24, 2024

`
from torchtext.datasets import IMDB
train_dataset = IMDB(split='train')
test_dataset = IMDB(split='test')

`
I keep getting this error despite manually installing torchdata. When I tried installing the exact version of torchtext used in the chapter, version 0.10.0, pip couldn't recognize as a valid version.

I can't find any solution to it online

@kostuyn
Copy link

kostuyn commented Jan 4, 2025

@Emmanuel-Ibekwe I installed 0.17.0 version the package and it work (for colab)
!pip install portalocker --quiet
!pip install torchtext==0.17.0 --quiet

after installed - Runtime -> Restart runtime option in the Colab menu

(last version of torchtext has a problem pytorch/text#2272)

@rasbt
Copy link
Owner

rasbt commented Jan 4, 2025

@Emmanuel-Ibekwe It looks like you are right, and the PyTorch maintainers removed torchtext 0.10.0 from PyPi for some reason. The ch15 notebook here on GitHub should be updated to work with newer versions of torchtext though as @kostuyn mentioned. It would require installing portalocker as well as described above. Let us know in case this still doesn't work.

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Jan 7, 2025

Thanks @rasbt and @kostuyn for the responses. I did find out through chatgpt (great tool) that the datasets package from the Huggingface community has the imdb dataset. So I used it.
Using the datasets package I got values for the various training and validation accuracies of different epochs that were different from the ones in the text. The model overfitted. At some point both accuracies maintained an accuracy score of 100%. But the model performed terribly on the test dataset. I got an accuracy of 68.5%.
Thanks one more time.

Edit: built a custom dataset for the imdb dataset from torch.utils.data to help in data loading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants