In this project, I tackled the travel time optimization problem for taxi vehicles. This can be framed as the Traveling Salesman Problem, a well-known computer science problem. The objective is to find the shortest route that visits a set of locations. For this problem, optimization techniques are required to intelligently search the solution space and find near-optimal solutions. More specifically, I first used XGBoost model to predict travel times between every pair of pick-up and drop-off locations. Then, I used evolutionary algorithms, namely ant colony and genetic, to find the best travel itinerary for the vehicles in the data.
An accompanied blog post on Medium can be found at this link: Using Ant Colony and Genetic Evolution to Optimize Ride-Sharing Trip Duration
The data is the NYC Taxi and Limousine Commission Trip Record data, already downloaded on Kaggle. I have the monthly data in the year 2016 for yellow taxi, green taxi, and for-hire vehicles. The dataset has close to 1.5 million trip records with 11 attributes.
- python3
- matplotlib
- numpy
- pandas
- xgboost
- geopy
$ python3 -m pip install xgboost
$ python3 -m pip install geopy
$ python3 basicAnalysis.py
$ python3 sophisticatedAnalysis.py
$ python3 XGBoost.py
$ python3 aco_main.py -h
usage: aco_main.py [-h] [--verbose] loc_count ant_count g alpha beta rho q
positional arguments:
loc_count number of locations (default is 15)
ant_count number of ants to use (default is 10)
g number of generations (default is 100)
alpha relative importance of pheromone (default is 1.0)
beta relative importance of heuristic information (default is 10.0)
rho pheromone residual coefficient (default is 0.5)
q pheromone intensity (default is 10.0)
optional arguments:
-h, --help show this help message and exit
--verbose print out each generation cost and best path
$ python3 genetic_evo_main.py -h
usage: genetic_evo_main.py [-h] [--verbose] loc_count n g m c
positional arguments:
loc_count number of locations (default is 15)
n population size (default is 10)
g number of generations (default is 100)
m mutation factor (default is 0.5)
c crossover rate (default is 0.7)
optional arguments:
-h, --help show this help message and exit
--verbose print out each generation cost and best path