Testing out the reward function, also work as a journal about how I reach the top 50s in the AWS Student League :D
First try on a 10 minutes training.
Try 60 minutes of training on the new method(Point to the tangent of the two closest waypoints). The distances between two waypoints are too far that's why it failed. A great lesson to learn is when you test out a new reward function. You should do an MVP and check out the result. Instead of waiting for it for 60 minutes, 20 minutes will be enough to see how the new method approach.
For video (link)
20 minutes of training focusing on the difference between heading and car_next_waypoint_degree. The system will get a heavy penalty(reward *= 0.1) if the difference this larger than the threshold(40 degrees). It turns out that when occurring corners, they think the previous behavior is completely wrong. So the system tends to have a hard right turn even if they should keep turning left.
Same as Attempt_3 but less penalty which turns out the system behaves less sharp turns than the previous model. The system becomes numb to penalties and requires more time to train. Still, this idea won't work.
Video (link)
The problem with all the attempts I have is that the system won't know what to do if it is approaching the corner. So, I decided to give a penalty if the car stays on the right side of the track in front of a left-turning corner and the same conversely. Another feature is to compare progress-difference between now and 30 seconds before. The system will get more reward if it progressed more than before.
Video(Link)
YESSS!! It only went off-track for 4 times and I got a really good score of 3:17. It is close to the Udacity nanodegree requirement. Let's keep the good pace going. :)
There is three ways can fix the problem of running off-track. First, more reward is the car is cloeser to the center line. Second, more reward staying in either the right side or the left side. Third, more training time.
Simply add 10 minutes does help the model to perform better.
Less penalty if the car is on the other side-wanted. 0.7 -> 0.8 Try to be more at the center. Shrink marker_2 0.3 -> 0.25
Video([link](Uploading output3.mp4…))
After the testing. I really think vague commands really need more time training.
Origin penalty if the car is on the other side-wanted 0.8 -> 0.7 Less reward on marker_2 0.9 -> 0.8
Video(Link)
I don't know what I did wrong.....
GAS GAS GAS! I'm gonna step on the gas. Clone of Fifth Attempt. Add up the speed detection function.
Video(link)
Maybe have to train it from the beginning to speed it up
Speed up with progress and avg_speed.
Video(link)
The run is almost good as Fifth Attempt, but the speed didn't inprove significantly.
Canceled
Check speed every 3 second and made the reward *1.32. Progress_diff turn up to *1.2 30 mins of training. No penalty while making a right turn.
Video(link)
Speed every 2 seconds *1.22 Progress_diff *1.1 More reward sticking toward boarder while turing *1.15. 20 mins training
Video(link)
20 more training with clone of 12th.
Video(link)
20 mins. Speed every 1 second. Turn left penalty 0.7 -> 0.6. Remove speed up by progress.
20 more mins training clone of 14th.
20 more mins training clone 14th.
20 more mins training clone 14th. But track_angle 10 -> 5
Less reward for being on the right side of the track 1.1 -> 1.05 Init waypoint set to 0 instead of 1. Marker_2 *0.8
20 more mins training clone 18th.
Video(link)
Still have the problem that will keep running off track on the same spot.
15 mins. Right side rewar 1.05 -> 1.03 Turn left angle 5 -> 18
10 mins Turn left angle 18 -> 0
Clone of 15th attempt. Right side bonus only on straight line or turning right.
Clone of 22th attempt. Straight line marker_2 0.75 -> 0.72. Left turn distance from center 1.15 -> 1.18
Clone of 23 attempt. Right side rewar 1.03 -> 1.01 Left turn distance from cetner 1.18 -> 1.2
20 mins. marker_1*1.22 marker_2 * 1.2
Video(link)
Personal Best :D
20 mins. clone of 25th attempt. Right side reward 1.02. marker2 when turning left 1.21.
Video(link)
Break personal best again!
20 mins. Clone of 26th attempt. Right side reward 1.02 -> 1.01. Left side reward turning left 1.0 -> 1.02
20 mins. Clone of 26th attempt. Track_4 *0.6 Track_3 0.1 -> 0.8 marker_1 1.22 -> 1.25
55 mins. Same as 28th, but I think the problem with 28th attempt is that it keeps the memory from the pass. That's why it keeps failing. Last try of this month.
Didn't go off track. That is the most happiest thing ever :D Unfortunately, the speed isn't fast enough. The record best record is 03:06.399 If I can reach the endline 7 seconds earlier, I will eligible for the Udacity nanodegree sponsorship. Let's just put this aside and join again next month, you will see me come back June, 2022. Stay tuned. :)