A language-model notebook built using PyTorch for next-word prediction.
This project demonstrates how to build a Next-Word Prediction Model using PyTorch.
The notebook (next_word_prediction.ipynb) covers the complete workflow:
text preprocessing → tokenization → building a vocabulary → creating training sequences → defining a PyTorch model → training → inference.
The model predicts the next likely word given a partial sentence.
- Built entirely using PyTorch
- Custom neural network for next-word prediction
- Tokenization and vocabulary building
- Dataset → tensor conversion
- Training loop written manually (no high-level wrappers)
- Inference using model output probabilities
- Easy to extend (larger models, more layers, pretrained embeddings, etc.)
- Load text data directly in the notebook.
- Clean and normalize (lowercasing, punctuation removal).
- Tokenize into words.
-
Map each unique word to an integer index.
-
Create:
word_to_indexindex_to_word
-
Convert text into input sequences for training.
-
Example:
Input: "I love deep" Label: "learning"
-
Build PyTorch tensors for:
- Input sequences
- Target (next word)
-
Use DataLoader for batching.
Typical components include:
- Embedding layer
- LSTM / GRU / RNN
- Linear (Fully Connected) output layer
- Softmax for prediction over vocabulary
Defined using:
import torch.nn as nnThe notebook implements:
- Forward pass
- Loss calculation (CrossEntropyLoss)
- Backward pass
- Optimizer step
- Epoch-level logging
Given input text:
predict_next_word("The world is")The model:
- Tokenizes
- Passes through network
- Gets softmax probabilities
- Selects highest-probability next word
| Library | Purpose |
|---|---|
| PyTorch | Model, training loop, tensors |
| NumPy | Data operations |
| NLTK (optional) | Tokenization / stopwords |
| re | Text cleaning |
| Matplotlib (optional) | Plotting loss |
No TensorFlow or Keras is used.
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # WindowsExample requirements.txt:
torch
numpy
nltk
matplotlib
Install:
pip install -r requirements.txtjupyter notebook next_word_prediction.ipynbExecute all cells — the notebook will preprocess data, build vocabulary, train the PyTorch model, and save it if implemented.
At the bottom of the notebook:
predict_next_word("I want to")Output example:
"learn"
├── next_word_prediction.ipynb # Main PyTorch notebook
├── requirements.txt # Dependencies
└── README.md # Documentation