Efficient autonomous exploration of uncharted terrains, such as the Martian surface, presents significant challenges in robotics and artificial intelligence. This project addresses the task of navigation and mapping in Mars-like environments by leveraging Deep Reinforcement Learning (DRL). Specifically, it explores and evaluates the performance of Twin Delayed Deep Deterministic Policy Gradient (TD3) and Proximal Policy Optimization (PPO) algorithms within the simulated MarsExplorer environment. The findings demonstrate enhanced terrain exploration, effective obstacle avoidance, and high coverage rates under defined constraints.
python version <= 3.10
You can install MarsExplorer environment by using the following commands:
- Clone the repository.
git clone https://github.com/GouriRajesh/Reinforcement-Learning-Based-Autonomous-Mars-Terrain-Exploration-and-Navigation-Framework.git
- Move into the project root directory.
cd Reinforcement-Learning-Based-Autonomous-Mars-Terrain-Exploration-and-Navigation-Framework
- Install the package.
pip install gym==0.17.3
pip install -e mars-explorer
- Install the dependencies.
For mac users:
sh setup.sh
For Windows users:
bash setup.sh
You can have a better look at the dependencies at:
setup/environment.yml
Please run the following command to make sure that everything works as expected:
python mars-explorer/tests/test.py
We have included a manual control of the agent, via the corresponding arrow keys. Run the manual control environment via:
python mars-explorer/tests/manual.py
To train your own agents use the below commands.
For TD3:
python td3/td3_train.py
For PPO:
python ppo/ppo_train.py
For Baseline DQN:
python td3/baseline_dqn_train.py
For Baseline PPO:
python ppo/baseline_ppo_train.py
All of the results will be located in pickle files at:
~/training_results
The trained models are then saved as a .pth file at:
~/trained_models
To view the results of your training run the below commands:
Serial No. | Results | Command |
---|---|---|
1. | TD3 Rewards, Actor & Critic Loss | python td3/td3_results.py |
2. | TD3 Percentage Area Covered | python td3/td3_percentage_area_covered_result.py |
3. | TD3 vs Baseline DQN | python td3/td3_vs_baseline_dqn_result.py |
4. | PPO Rewards, Actor Loss | python ppo/ppo_results.py |
5. | PPO Percentage Area Covered | python ppo/ppo_percentage_area_covered_result.py |
6. | PPO vs Baseline PPO | python ppo/ppo_vs_baseline_ppo_result.py |
7. | PPO vs TD3 | python ppo/ppo_vs_td3_result.py |
8. | PPO vs TD3 vs Baseline PPO vs Baseline DQN | python td3/ppo_bppo_td3_dqn_comparison_result.py |
To train and view the results of different learning rates use the below commands.
For TD3:
python td3/td3_lr_test.py
For PPO:
python ppo/ppo_lr_test.py
All the results for the plot functions as described above are stored at:
~/plot_figs
Here is a small video of how the PPO training looks like: