Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about dataset generation. #191

Open
Yyb-XJTU opened this issue Nov 28, 2024 · 5 comments
Open

Question about dataset generation. #191

Yyb-XJTU opened this issue Nov 28, 2024 · 5 comments

Comments

@Yyb-XJTU
Copy link

Your processing steps are to first process the db file into pkl, and then generate a dataset in arrow format. However, the original nuplan dataset is hierarchical (according to map), do I need to operate one by one?

@JohnZhan2023
Copy link
Collaborator

JohnZhan2023 commented Nov 29, 2024

No, the two parts are completely independent. You can concurrently run:

    python generation.py  --num_proc 40 --sample_interval 100  
    --dataset_name boston_index_demo  --starting_file_num 0  
    --ending_file_num 10000  --cache_folder {PATH_TO_CACHE_FOLDER}
    --data_path {PATH_TO_DATASET_FOLDER}  --only_data_dic

for pkl

and the

    python generation.py  --num_proc 40 --sample_interval 100  
    --dataset_name boston_index_interval100  --starting_file_num 0  
    --ending_file_num 10000  --cache_folder {PATH_TO_CACHE_FOLDER}  
    --data_path {PATH_TO_DATASET_FOLDER}  --only_index  

for arrow

@Yyb-XJTU
Copy link
Author

Thanks for your reply. Should I split the Nuplan dataset into train, val, and test? Then, should I perform the above two steps to generate pkl and arrow for each subset, respectively?

@JohnZhan2023
Copy link
Collaborator

You don't need to generate each subset and it will automatically generate all the subsets.

@Yyb-XJTU
Copy link
Author

Yyb-XJTU commented Dec 1, 2024

This is my nuplan dataset file structure(All db files in a Folders):
image
I read the generation.py code and found that each subset (train, val and test) needs to be processed separately, and there is no automatic processing logic.

@JohnZhan2023
Copy link
Collaborator

Thank you for pointing out my mistakes. You are right. We should run the python file separately for each subset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants