Skip to content
This repository has been archived by the owner on Jul 2, 2023. It is now read-only.

Latest commit

 

History

History
102 lines (61 loc) · 7.96 KB

README.md

File metadata and controls

102 lines (61 loc) · 7.96 KB

Mills with robot antennae

millzbot

A GPT-2 chatbot trained on my bosses tweets, and a guide to making your own

Introduction

After seeing so many projects being made with OpenAI's GPT-2, I decided to give it a whirl as a first foray into training machine learning models, as well as building bots for Slack and Twitter. Naturally using @millsustwo's tweets (with his permission) to fine-tune the model was the first thought I had. If GPT-2 can adapt to his unique writing style then it really is as good as they say.

Luckily, it is that good! Millzbot lives on Twitter, on a demo site, and as a slackbot in ustwo's Slack workspace.

Twitter screenshot Slackbot screen recording

(The model is working fine. He actually tweets like that.)

Talk to millzbot (live demo) | Follow millzbot on Twitter | See some of my favourite responses from millzbot

Make your own

On the whole the process was surprisingly easy, thanks hugely to Max Woolf and the writing/tools that he's already created around GPT-2. In this repo you'll find a summary of the process I used to make millzbot (with links to the guides and resources I used), and some of Max's tweet-fetching and server code. You'll also see some of my additional code, like declarative terraform infrastructure and middleware API layers to provide responses from the trained model to various platforms (Slack, Twitter and the demo site), written as serverless functions.

The instructions below and code in this project's folders should cover everything you need to do the following:

  1. Get your initial dataset
  2. Use it to fine-tune a GPT-2 instance with Google Colaboratory
  3. Build and test your trained model locally in a Docker container
  4. Deploy your container to a server with Cloud Run
  5. Request generated text from it via simple POST requests or serverless functions

And at the end of it you should have your own GPT-2 bot (for free!). I highly recommend that you also read through the links that I reference for more detailed instructions, and to get a more thorough understanding of GPT-2.

Training the model

Fork this repo and clone it to your computer, then go to the /training folder. In there you'll find a script written by Max that can bulk download tweets from a user, and instructions on how to use it. Once you have the tweets downloaded, just follow the instructions on Max's Colaboratory notebook to train the model with a GPU (for free).

Mills with robot antennae

If you're not training the model on tweets, and instead are using some other text source, then make sure you have that dataset ready, and use this notebook instead to do the training.

The finetuning of GPT-2 could take hours to complete, so check on it every now and again but otherwise relax (and maybe read the next section to find some tasks to do in the meantime). Once this is complete, test the model a few times with the notebook's generate cell. Try changing some of the variables like temperature or prefix. Once you're happy with the model, download it, and uncompress it inside the /backend/model folder. Now the easy bit is over 😈

Deploying to a server

Now that we've got our trained model ready to serve up text, it's time to put it somewhere! If you head to the /backend folder you'll see instructions in it's readme for building and deploying your model, so you can start making requests to it via HTTP. The instructions are derived from this guide.

To do this, you'll need to create a free Google Cloud Platform account and project (that will use Cloud Run), and install Docker and the Google Cloud SDK. It might be worth getting all of this set up while your model is training in the above step. Once this is done follow the steps in /backend's readme to build and deploy your model.

Requesting generated text

Now that our trained model is hosted online, we can make requests to it! An example of a request written in JavaScript could look like the following:

const getGeneratedText = async () => {
    const request = await fetch(MODEL_ENDPOINT_URL);
    const requestData = await request.json();
    return requestData.text;
};

Just like that we can now get GPT-2 generated text, fine-tuned with your dataset, from anywhere!

For more complicated requests that may need the inputs or responses manipulated, you may need to create a "middleware" API that can include it's own logic. In the /backend/functions folder are some examples of functions I've written to do this, to handle requests for different platforms (Twitter, Slack and the demo site), which are deployed as serverless functions (also on GCP).

And that's about everything you need to get set up with your own GPT-2-based bot! 🤖

Resources

Below are some of Max Woolf's resources and guides that this project is built ontop of. Please visit them for more in-depth instructions and reading.

Bot Ethics

Bots represent a confusing area as far as ethics goes. There are plenty of thoughts online on the topic, which I would urge you to research, but on a low level you should adhere to the following rules:

  1. Make sure to get permission from whoever you will be impersonating

    • In my case I asked Mills and told him what data I would collect, and what the bot would be used for. Thankfully he granted me permission
  2. Make it obvious that your bot is a bot

    • Stick it it the twitter name and bio, for example
    • Say whether or not you are curating the output or if it's fully automated
  3. Don't keep hold of data that you don't need to

    • Once you've fine-tuned your GPT-2 model and are happy with it, you shouldn't need the raw data anymore. Delete it to avoid any risk of mishandling

License

This repo is MIT Licensed.