The content in this repo follows on an introduction presentation in class.
We've prepared a couple of examples to get you started with ML5.js with vanilla JavaScript. They are explained in depth during class. You can find these examples in the demos folder:
- Body Pose
- Body Segmentation
- Hand Pose
- Face Mesh
- Image Classifier
- Sound Classifier
- Sentiment Analysis
- Teachable Machine
- Neural Network Rock-Paper-Scissors
We'll build a version of the "Pong" arcade game, which we'll control with our eyebrows.
Let's combine the Face Mesh model with a Neural Network to control the paddle.
Create a copy of the Neural Network Rock-Paper-Scissors and make some adjustments.
- Change the title and h1 to "Face Pong".
- Change the buttons, so they say "up", "neutral", and "down". Make sure to update the ids and variable names as well and change the values you pass into the
sample()
function. VS Code Tip: UseCtrl + D
to select all occurrences of the same word. Additionally, you can right-click on a variable and select "rename symbol" to change all occurrences of that variable. - Change the
handPose
andhands
variables tofaceMesh
andfaces
. - Load the Face Mesh model instead of the Hand Pose model.
- Make the circles a bit smaller in your drawVideoWithKeyPoints function. You should see your face points on the screen.
Let's try training our model to recognize when we:
- Raise our eyebrows (move the paddle up)
- Neutral (keep the paddle in the middle)
- Lower our eyebrows (move the paddle down)
Take a lot of sample for these 3 classes and hit the train button. We're getting this loss function:
The loss doesn't really go down, and the model doesn't seem to work. There's a couple of strategies you can take when this happens:
- Improve the quality of your data
- Change the model architecture
- Train for a longer time
- Change the learning rate
Right now, we are passing a lot of irrelevant points to the model. We're only interested in the positions of our eyebrows vs our eyes.
As we will need these keypoints in multiple places, let's create a function that will return the keypoints we need.
const getRelevantKeyPoints = (face) => {
return [
...face.leftEye.keypoints,
...face.leftEyebrow.keypoints,
...face.rightEye.keypoints,
...face.rightEyebrow.keypoints,
];
};
We could even reduce the number of keypoints, by just taking every 3rd keypoint:
const getRelevantKeyPoints = (face) => {
return [
...face.leftEye.keypoints.filter((keypoint, index) => index % 3 === 0),
...face.leftEyebrow.keypoints.filter((keypoint, index) => index % 3 === 0),
...face.rightEye.keypoints.filter((keypoint, index) => index % 3 === 0),
...face.rightEyebrow.keypoints.filter((keypoint, index) => index % 3 === 0),
];
};
Let's try training our model again with this reduced set of data. Don't forget to use this getRelevantKeyPoints(face)
function in your sample()
and prediction logic:
const keypoints = getRelevantKeyPoints(faces[0]).map(keypoint => [keypoint.x, keypoint.y]).flat();
The loss function goes down, but it seems like the training needs more time to finish:
Increase the number of epochs to 200 and try training again:
const options = {
epochs: 200,
};
classifier.train(options, whileTraining, finishedTraining);
The loss function goes down further, but not fast enough. However, there's one more trick we can apply to our input data.
We want the model to work regardless of the position of our face on the screen.
First of all, let's make sure our eyes are always positioned on the top-left of the canvas, no matter where our face is on the screen. We can do so by subtracting the minimum x and y values from all the keypoints:
const getRelevantKeyPoints = (face) => {
const keypointMap = (keypoint) => {
return {
x: keypoint.x - face.box.xMin,
y: keypoint.y - face.box.yMin,
z: keypoint.z
}
};
return [
...face.leftEye.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
...face.leftEyebrow.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
...face.rightEye.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
...face.rightEyebrow.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
];
};
You should see your face points on the top-left of the screen. There's one more issue though: if we move closer and further from the screen, the distance between our eyes and eyebrows will change. We can normalize the distance between our eyes and eyebrows by dividing all the y values by the distance between our eyes:
const getRelevantKeyPoints = (face) => {
const leftEyeCenter = {
x: face.leftEye.centerX,
y: face.leftEye.centerY,
};
const rightEyeCenter = {
x: face.rightEye.centerX,
y: face.rightEye.centerY,
};
// calculate the distance between the eyes, we use this as a distance-based scaling factor
const eyeDistance = Math.sqrt(
Math.pow(leftEyeCenter.x - rightEyeCenter.x, 2) +
Math.pow(leftEyeCenter.y - rightEyeCenter.y, 2)
);
const targetDistance = 100;
const keypointMap = (keypoint) => {
return {
x: targetDistance * (keypoint.x - face.box.xMin) / eyeDistance,
y: targetDistance * (keypoint.y - face.box.yMin) / eyeDistance,
z: keypoint.z
}
};
return [
...face.leftEye.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
...face.leftEyebrow.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
...face.rightEye.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
...face.rightEyebrow.keypoints.filter((keypoint, index) => index % 3 === 0).map(keypointMap),
];
};
Go through the sampling and training process again. Make sure to capture different angles of your face, to make your model more robust. We're getting a nice loss graph:
Make sure to save your model once you're happy with the results!
You can compare your current state of this excercise with 01-face-pong.html.
Before we start creating the game, let's add an intro state. We'll present two options to our player: "Start" and "Create Model".
- Start should load the pretrained model and go to our game state (STATE_PREDICTING).
- Create Model should go to the sampling state (STATE_SAMPLING).
Try implementing this logic yourselves, before reading the step-by-step guide below.
Create an extra css selector to show items with the display-intro
class:
.intro .display-intro {
display: block;
}
Add a div with the display-intro
class to your html. This div should contain two buttons: "Start" and "Create Model".
<div class="hidden display-intro">
<button id="start">Start</button>
<button id="create">Create Model</button>
</div>
Define a new STATE_LOADING
, and make sure to add it to ALL_STATES
:
const STATE_LOADING = "loading";
const STATE_INTRO = "intro";
const STATE_SAMPLING = "sampling";
const STATE_TRAINING = "training";
const STATE_PREDICTING = "predicting";
const ALL_STATES = [
STATE_LOADING,
STATE_INTRO,
STATE_SAMPLING,
STATE_TRAINING,
STATE_PREDICTING
];
Get references to the two new buttons at the top of your script:
const $start = document.querySelector('#start');
const $create = document.querySelector('#create');
In the section where you add the event listeners, add the following:
$start.addEventListener('click', () => {
classifier.load(pretrainedModelURL.toString(), () => {
console.log('model loaded');
setState(STATE_PREDICTING);
});
});
$create.addEventListener('click', () => {
setState(STATE_SAMPLING);
});
Finally, replace the setState(STATE_SAMPLING);
with setState(STATE_INTRO);
at the end of your setup()
function.
You should now be able to either go to the game state by pressing the start button or go to the sampling state by pressing the create button. You can compare your version with the solution in 02-face-pong.html.
We've reached the game-creation part of our exercise! We now have a robust system to detect whether we want to move the paddle up, down, or keep it in the middle. Let's create a simple game where we control the paddle with our eyebrows.
Create 2 new global variables which describe the paddle position and size:
const PADDLE_SIZE = 100;
// start with the paddle in the middle
let paddleY = 480 / 2 - PADDLE_SIZE / 2;
Update the drawPredicting()
function to update the paddle position based on the prediction:
const drawPredicting = () => {
drawVideoWithKeyPoints();
const classification = classificationResults[0]?.label;
$results.textContent = classification;
// update the paddle position based on the classification
if (classification === 'up') {
paddleY -= 5;
} else if (classification === 'down') {
paddleY += 5;
}
// draw the paddle
ctx.fillStyle = 'blue';
ctx.fillRect(0, paddleY, 20, PADDLE_SIZE);
};
You should see the paddle move, based upon your eyebrow movements:
You can compare your current state of this excercise with 03-face-pong.html.
We need to make sure the paddle doesn't move outside of the canvas. We can do so by adding the following code to the drawPredicting()
function:
// make sure the paddle doesn't move outside of the canvas
paddleY = Math.max(0, Math.min(paddleY, 480 - PADDLE_SIZE));
Next up, create the necessary variables in your global scope for controlling the ball. We need to keep track of the position and the speed of the ball:
const BALL_SIZE = 20;
let ballX = 320;
let ballY = 240;
let ballSpeedX = 5;
let ballSpeedY = 5;
Update the drawPredicting()
function to update the ball position and draw it:
// update the ball position
ballX += ballSpeedX;
ballY += ballSpeedY;
// draw the ball
ctx.fillStyle = 'green';
ctx.beginPath();
ctx.arc(ballX, ballY, BALL_SIZE, 0, 2 * Math.PI);
ctx.fill();
You should see the ball move around the screen. However, it will move off the screen, as we haven't implemented any collision detection with the walls yet.
Update the drawPredicting()
function to make the ball bounce off the walls:
// update the ball position
ballX += ballSpeedX;
ballY += ballSpeedY;
// make the ball bounce off the walls
if (ballX - BALL_SIZE / 2 + ballSpeedX < 0 || ballX + BALL_SIZE / 2 + ballSpeedX > canvas.width) {
ballSpeedX *= -1;
}
if (ballY - BALL_SIZE / 2 + ballSpeedY < 0 || ballY + BALL_SIZE / 2 + ballSpeedY > canvas.height) {
ballSpeedY *= -1;
}
You should now see the ball bounce off the walls. You can compare your current state of this excercise with 04-face-pong.html.
We need to check if the ball collides with the paddle. If it does, we should make the ball bounce back. If it doesn't, we should go to a lost state.
Update the drawPredicting()
function to check for collisions with the paddle:
// check for collision with the paddle
if (ballX - BALL_SIZE / 2 < 20 && ballY > paddleY && ballY < paddleY + PADDLE_SIZE) {
ballSpeedX *= -1;
}
Additionally, check if the ball goes off the left side of the screen. To do so, remove the bounce logic for that side, and add a seperate check for the ball going off the screen. If it does, reset the ball position and speed, and move to the intro state for now (we'll create a lost state later):
// check if we missed the ball
if (ballX - BALL_SIZE / 2 + ballSpeedX < 0) {
// reset the ball position
ballX = 320;
ballY = 240;
// reset the ball speed
ballSpeedX = 5;
ballSpeedY = 5;
// go to the intro state for now
setState(STATE_INTRO);
}
// make the ball bounce off the walls
if (ballX + BALL_SIZE / 2 + ballSpeedX > canvas.width) {
ballSpeedX *= -1;
}
if (ballY - BALL_SIZE / 2 + ballSpeedY < 0 || ballY + BALL_SIZE / 2 + ballSpeedY > canvas.height) {
ballSpeedY *= -1;
}
Once this is working, we can create a lost state:
- Create a css selector for that state
- Define a
STATE_LOST
variable and add it toALL_STATES
- Create a
drawLost()
function and call that one from yourdraw()
function when the state isSTATE_LOST
- Add a button to restart the game. This button should go back to the
STATE_PREDICTING
state.
You can compare your current state of this excercise with 05-face-pong.html.
Our game is quite simple at the moment. You could add more features to it, such as:
- Add a score counter for each time you bounce the ball off the paddle
- Add a countdown before the game starts
- Cleaning up the graphics
- Adding sound effects