Name	Name	Last commit message	Last commit date
parent directory ..
langton_ant_dataset_example	langton_ant_dataset_example
README.md	README.md
dataset_generator.py	dataset_generator.py

Langton's Ant VQA Dataset Generator

Overview

Langton's Ant VQA Dataset Generator is a tool designed to simulate the classic cellular automaton Langton's Ant and generate a comprehensive Visual Question Answering (VQA) dataset. It creates game board images featuring the ant's position, direction, and the current state of cells (black or white). Based on these images, it generates different types of questions about the current state and future states of the system. The generated VQA dataset includes game board images, questions, multiple-choice answers, and detailed step-by-step analyses, making it suitable for training multimodal models.

An example game image:

Features

Dynamic Grid Sizes: Supports multiple grid sizes (5x5, 9x9, 13x13) corresponding to different difficulty levels
Coordinate System: Clear coordinate labeling system for precise position reference
Visual Elements:
- White/black cells representing the board state
- Red arrow indicating ant's position and direction
- Coordinate labels for easy reference
Multiple Question Types: Generates various types of questions about current state and future states
Detailed Analysis: Provides step-by-step explanations for each question

Game Rules

In Langton's Ant, we have a grid where each cell is either white or black. A red arrow represents an ant, showing its current position and direction. The ant follows these simple rules:

If the ant is on a white cell, it turns right 90 degrees, changes the cell to black, and moves forward one step
If the ant is on a black cell, it turns left 90 degrees, changes the cell to white, and moves forward one step The coordinates system: The top-left cell is (0,0), with x increasing downward and y increasing rightward.

Project Structure

generate_dataset.py: Main script containing the LangtonVQAGenerator class
/images: Directory containing generated game board images
/states: Directory containing JSON files with board states
data.json: Output JSON file containing the complete dataset

Output Contents

The generator creates three types of output:

Images (PNG):
- Game board visualizations with ant, cells, and coordinates
- Stored in /images directory
States (JSON):
- Complete board state information
- Includes grid configuration and ant state
- 0 indicates white, 1 indicates black
- Stored in /states directory
Dataset (JSON):
- Questions, answers, and analyses
- References to corresponding images and states
- Stored as data.json

Supported Question Types

By Type Category

Target Perception:
- Current position and direction of the ant
- Difficulty: Easy
State Prediction:
- Predict ant's position and direction after several steps
- Predict cell color changes after several steps
- Difficulty: Medium to Hard

By Difficulty Level

Easy:
- plot_size: 5x5 grid
- qa_type: Current state identification questions
Medium:
- plot_size: 9x9 grid
- qa_type: Movement prediction questions
Hard:
- plot_size: 13x13 grid
- qa_type: Cell state prediction questions

Usage

Prerequisites

pip install pygame

Basic Usage

python dataset_generator.py

And it will automatically generate a file branch like:

foo/ # Your current working directory

├── images/ # images of the game board

├── states/ # JSON files containing the board states

└── data.json # VQA dataset

Adjustable Parameters

Dataset Size:

dataset_size: Number of samples to generate per difficulty level, can be adjusted at:

data += generate_mcq_dataset(dataset_size=1, options_num=8)
data += generate_fill_dataset(dataset_size=1)

Grid Sizes:

GRID_SIZES: Dictionary containing the grid sizes for each difficulty level, can be adjusted at:

GRID_SIZES = {
    "Easy": 5,    # Can be modified
    "Medium": 9,  # Can be modified
    "Hard": 13    # Can be modified
}

Steps:
- steps: Number of steps for question 2 and 3, can be adjusted at:
```
steps = random.randint(5, 12)
```
options_num
- options_num: Number of options for multiple-choice questions, can be adjusted at:
```
data += generate_mcq_dataset(dataset_size=1, options_num=8)
```

Additional Notes

The generator ensures periodic boundary conditions (the ant wraps around the grid edges)
Questions are generated with varying complexity based on the number of steps
Each sample includes multiple questions of different types
Images are generated with clear visual indicators and coordinate systems

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Langton's Ant VQA Dataset Generator

Overview

Features

Game Rules

Project Structure

Output Contents

Supported Question Types

By Type Category

By Difficulty Level

Usage

Prerequisites

Basic Usage

Adjustable Parameters

Additional Notes

License

FilesExpand file tree

langton_ant

Directory actions

More options

Directory actions

More options

Latest commit

History

langton_ant

Folders and files

parent directory

README.md

Langton's Ant VQA Dataset Generator

Overview

Features

Game Rules

Project Structure

Output Contents

Supported Question Types

By Type Category

By Difficulty Level

Usage

Prerequisites

Basic Usage

Adjustable Parameters

Additional Notes

License