Here is behavioral cloning project, that I did as part of Udacity Self-Driving Car Nanodegree. Behavioral cloning is technics for teaching neural networks to do useful things or behave in desired way by teaching it to repeat similar actions in similar situations as it was for known agent. For example it can be a human or some algorithm. In this example we need to steer a car while driving in simulator (Linux, MacOS, Win32, Win64) based onboard camera images. So firstly we need to drive the car ourselves and collect the images from the camera and corresponding steering wheel angle positions. Then we need to teach neural network to return steering angle for provided image - well known and tested supervised learning task. Sounds pretty simple.
There is nice Nvidia paper where they applied this approach to drive a real car. You may find videos of it on youtube. It's pretty amazing to watch. I use the same but simplified method. Since we are working with images we need convolutional neural network or CNN which is specialized on image processing. There are a lot of free tutorials over the internet about CNNs. For example. So I decided to use similar to Nvidia CNN.
Why this architecture? So, I used grayscaled and normalized to interval [-0.5, 0.5] images to reduce memory and GPU usage on training. That's why I have 1 channel image as input. I tried to use minimalistic neural network but it's hard to say that it is tiny network. It has 3 129 152 parameters. 110 368 and 3 018 784 parameters for CNN and fully connected part respectively. It is big number of parameters and additional experimenting probably may decrease that number. Also I used fully connected layer with 5 hidden layers to allow fully connected network to have its own lower, middle and high level features as it is done in CNN. Also I used dropout with 0.5 probability on all layers except last 3 relatively small layers to deal with overfitting.
For all supervised learning tasks such as our behavioral cloning collecting training data has an extremely high importance. We need to collect dataset which is correctly represent all possible situations that can emerge while driving. I mean situations where we not only driving a car straight across the street and making turns left or right. But also situations where we need to recover a car from bad positions on the road. Such as various course deviations. As we mostly drive a car straight such samples of straight driving will probably dominate the dataset and lead to model overfitting. I decided to record number of datasets with different behavior and mix them togeather:
- Straight driving with turns
- Strong disturbance recovery
- Medium disturbance recovery
- Light disturbance recovery
I recorded disturbance recovery dataset by randomly making disturbance to right and recovery with left steering. So then I just excluded all right turns from that dataset. Also I repeated that procedure with left turns. So at the end I had 6 dataset with disturbance recovery. 3 for recovery from right and 3 for recovery from left. I mixed a 50% strong, 100% of medium and light disturbance and excluded 90% of straight drive samples in result dataset which have 12098 samples. There is 2 tracks in simulator. I used only track 1 for recording samples. Track 2 will be used to test the model how good it generalizes steering a car.
Left picture shows steering distribution in the dataset and right picture shows steering distribution in augmented dataset (read further for detalis)
12098 samples is not big enough dataset to train a good quality model because of overfitting. That's why we need to augment our dataset with generated samples. Our sample is image and corresponding steering wheel position. Here is an example of input image.
So we need to generate samples with images and corresponding steering wheel positions. I decided to use these transformations that leaves steering wheel untouched:
- randomize image brightness (-0.3, 0.3)
- randomly partially occlusion with 30 black 25x25 px squares
- very slightely randomly:
- rotation 1 degree amplitude
- shift 2px amplitude
- scale 0.02 amplitude
And
- flipping the image with corresponding flipping of steering wheel value
Also I disturbed steering wheel value with small normal noise (0 mean, 0.005 standard deviation). You can see resulting steering wheel distribution on the right dataset steering distribution image earlier. Augmented images looks like this:
For creating and training the model I used Keras which has big library of standard neural networks blocks. So training neural networks with Keras is pretty simple and fun :) For training I used only augmented samples, so model haven't seen one sample twice. That is again for preventing the overfitting. I used Adam optimizer with 1e-4 learning rate and mean squared error as loss function. I decided to use 112 samples batch size and 30 epochs of 44800 samples. Collected dataset I splitted into 67% train and 33% validation parts. As test dataset I used straight driving samples recorded on track 2. I saved model on every epoch and selected one from last epochs models that is able to drive track 2. I tried several times to train and noticed that not every time it is possible to select such model. But models are close to drive track 2. Despite of track 2 haven't been used to record samples and has much sharper turns (and higher complexity as for me). Training model can drive it without seeing a single image from it. That fact was very surprising for me.
How to improve the model? There is different ways. So firstly we need much more sophisticated environment in the simulator. Including many other types of turns and crossings, road materials, off-road places, weather effects like rain, snow, fog, ice, etc. Also we need other cars, pedestians, bicyclists, motorcycles and many other traffic participants. And possibly it is good to have function that can instantly place the car in random road situation, so it will possibly way to get more behavior rich dataset. Maybe it is helpful to have such simulator that can randomly generate whole the environment on request. Secondly, since we may have multiple images on a car we may use 3d reconstruction to generate samples with changed point of view. Like it is done in Nvidia paper or more detailed. In this case we may need proper camera calibration parameters and cameras placement geometry. I think further improvement may be reached with reinfocement learning using the model as pretrained actor in actor-critic approach. With RL model may improve errors in driving that it learned from behavior cloning dataset and possibly find smarter way of driving.
-
Record your dataset. It is csv file with such records
center image path, left image path, right image path, steering, throttle, breakvals, speed
only center image path and steering column is required other may contain arbitrary values
-
Use model.py or Behavioural-Cloning-clean.ipynb ipython notebook to train your model. dont forget to change dataset loading part to specify your dataset location. GPU is strongly recommended.
-
Select autonomous mode and track in simulator. Run command
python drive.py model.json
Already trained model is included. You can try it :)
Warning: simulator is not synchronised to drive.py. Make sure that simualtor and drive.py works without lag, use the most lower quality in simulator and GPU for CNN if ot is possible to avoid lags. Lags can significantly influence driving so it will look like very poor driving quality.
I found this approach very useful and I think it can be used in other simulators and games such as GTA or TORCS. Or even other type of games maybe.