Lunar Lander with Deep Q-Learning

This project implements a Deep Q-Network (DQN) agent to solve OpenAI Gym's Lunar Lander environment. The agent learns to safely land a lunar module on a landing pad using reinforcement learning techniques.

Environment Description

In this environment, the agent needs to control a lunar lander and safely land it on a designated landing pad. The landing pad is marked by two flag poles and is centered at coordinates (0,0). The lander starts at the top center of the environment with a random initial force and has infinite fuel.

State Space

The agent receives an 8-dimensional state observation:

Coordinates (x, y)
Linear velocities
Angle
Angular velocity
Boolean flags for left and right leg ground contact

Action Space

The agent can take 4 discrete actions:

0: Do nothing
1: Fire right engine
2: Fire main engine
3: Fire left engine

Rewards

The reward structure includes:

Proximity rewards based on distance to landing pad
Velocity-based rewards (slower movement is better)
Penalties for tilted angles
+10 points for each leg contact with ground
-0.03 points per frame for side engine usage
-0.3 points per frame for main engine usage
Terminal rewards: +100 for safe landing, -100 for crash

Project Structure

lunar_lander.py: Main implementation file
utils.py: Helper functions for the project
Requirements:
- numpy
- tensorflow
- gym (version 0.24.0)
- PIL
- pyvirtualdisplay

Technical Implementation

Deep Q-Network Architecture

Input layer: State size (8 dimensions)
Hidden layers: 2 dense layers with 64 units each (ReLU activation)
Output layer: 4 units (linear activation) corresponding to actions

Key Features

Experience Replay: Stores agent experiences in a memory buffer to break correlations between consecutive samples
Target Network: Separate network for stable Q-value targets
ε-greedy Exploration: Balanced exploration-exploitation strategy
Custom Video Generator For Each Landing: After training U can run Below Cells To get A Video Generated for Your Custom Lunar Lander Video Where Each Video will have Different Landing Position.

Training Parameters

Episodes: 2000
Max timesteps per episode: 1000
Memory buffer size: Defined in hyperparameters
Batch size: 64
Learning rate (α): Defined in hyperparameters
Epsilon decay: Progressive reduction from 1.0 to 0.01

Success Criteria

The environment is considered solved when the agent achieves an average score of 200 points over 100 consecutive episodes.

Usage

# Initialize environment
env = gym.make('LunarLander-v2')

# Train agent
trained_agent = train_dqn_agent(env)

# Watch trained agent
create_video(trained_agent, env, "lunar_landing.mp4")

Dependencies Installation

pip install numpy tensorflow gym==0.24.0 pillow pyvirtualdisplay

Notes

The agent uses random initial forces, so each landing attempt will be different
Training time can vary based on hardware capabilities
The implementation includes visualization capabilities for monitoring training progress

Contact

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Lunar_Lander_Files		Lunar_Lander_Files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lunar Lander with Deep Q-Learning

Environment Description

State Space

Action Space

Rewards

Project Structure

Technical Implementation

Deep Q-Network Architecture

Key Features

Training Parameters

Success Criteria

Usage

Dependencies Installation

Notes

Contact

About

Releases

Packages

Languages

sohailshk/Lunar-Lander-AI-Safe-Descent

Folders and files

Latest commit

History

Repository files navigation

Lunar Lander with Deep Q-Learning

Environment Description

State Space

Action Space

Rewards

Project Structure

Technical Implementation

Deep Q-Network Architecture

Key Features

Training Parameters

Success Criteria

Usage

Dependencies Installation

Notes

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages