Skip to content

Reproduce the Aha-Moment of R1-(Zero) for Autonomous Driving

Notifications You must be signed in to change notification settings

AutoRecursive/DrivingR1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Driving-R1: The Aha moment for LLM-based Autonomous Driving

This is an autonomous driving agent training project based on the Llama 3.1 8B model. The project uses GRPO (Generative Reinforcement Policy Optimization) method to train the model to generate appropriate control instructions based on scene descriptions.

Motivation & Background

This project attempts to reproduce the DeepSeek-R1 "Aha Moment" in the context of LLM-based autonomous driving. The implementation is migrated from Colab and based on Unsloth's framework. The pipeline (on Colab) has been tested to work effectively with both LLAMA and Qwen (by modifying chat template and formats) models at the 8B parameter level (but this repo is still under development).

Thanks to Unsloth's optimizations, these large models can be efficiently trained with PEFT (Parameter Efficient Fine-Tuning) using just 14GB of GPU memory, making it accessible for research and experimentation on consumer-grade hardware.

Limitations & Considerations

While this serves as a proof-of-concept implementation of R1 for autonomous driving, it's important to note that the current dataset has significant limitations. The training data lacks crucial information such as velocity and other important geometric parameters, which would be essential for real-world autonomous driving applications. This implementation should be considered as a minimal reproduction for educational and research purposes.

Project Structure

.
├── README.md
├── requirements.txt
├── train.py
└── src/
    ├── data_processing.py
    ├── model.py
    └── reward_functions.py

Key Features

  • Data Processing: Convert raw driving scenario data into the required training format
  • Model Training: Train the Llama model using GRPO method
  • Reward Functions: Include control value rewards and XML format rewards

Dataset

The training dataset (vqa_test_1k.pkl) is sourced from Wayve's Driving-with-LLMs repository. This dataset contains driving scenarios with corresponding control instructions.

Installation

pip install -r requirements.txt

Usage

  1. Data Preparation:

    • Ensure you have the vqa_test_1k.pkl data file
    • Data format should include scene descriptions and corresponding control instructions
  2. Training:

    python train.py

Model Input/Output Format

Input Format:

Scene description text

Output Format:

<reasoning>
Reasoning process
</reasoning>
<answer>
longitudinal: {acceleration value between 0-1}, lateral: {steering value between -1 to 1}
</answer>

Notes

  • Model uses 4-bit quantization to reduce memory usage
  • Uses LoRA for efficient fine-tuning
  • Requires sufficient GPU memory to run the 8B parameter model

About

Reproduce the Aha-Moment of R1-(Zero) for Autonomous Driving

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages