"learning nlp with the most formulaic possible text"
This project aims to investigate text generation, in particular a popular model known as GPT-2. Not knowing how easy it would be to train and in order to avoid getting too frustrated with what I'd hoped to be a fun diversion I decided to train it to produce fake power ranger plot synopsis' based on a training set of existing episode synopsis I scraped from the fan wiki. I chose this topic since power rangers has a few traits I'd anticiated as being particularly advantageous, namely:
- its completely ridiculous so the synopsis have high chance of being funny
- there are alot of episodes to train the model with (900 odd damn)
- its super formulaic with only a handful of characters which is hopefully easier to learn patterns from?
I also did some more basic EDA as can never practice that enough but to be hoenst it wasnt at all relevant or helpful.
The resultant model was supprisingly successful with absolute bare minimum of effort, outputs samples can be found in the 'samples' folder. highlights include:
- the blue ranger repeatedly getting married
- the black ranger flirting with criminality
- repeated references to some character named matthew
- a supprisingly well written finale
overall fairly happy with how this one turned out :)