-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about dataset generation. #191
Comments
No, the two parts are completely independent. You can concurrently run: python generation.py --num_proc 40 --sample_interval 100
--dataset_name boston_index_demo --starting_file_num 0
--ending_file_num 10000 --cache_folder {PATH_TO_CACHE_FOLDER}
--data_path {PATH_TO_DATASET_FOLDER} --only_data_dic for pkl and the python generation.py --num_proc 40 --sample_interval 100
--dataset_name boston_index_interval100 --starting_file_num 0
--ending_file_num 10000 --cache_folder {PATH_TO_CACHE_FOLDER}
--data_path {PATH_TO_DATASET_FOLDER} --only_index for arrow |
Thanks for your reply. Should I split the Nuplan dataset into train, val, and test? Then, should I perform the above two steps to generate pkl and arrow for each subset, respectively? |
You don't need to generate each subset and it will automatically generate all the subsets. |
Thank you for pointing out my mistakes. You are right. We should run the python file separately for each subset. |
Your processing steps are to first process the db file into pkl, and then generate a dataset in arrow format. However, the original nuplan dataset is hierarchical (according to map), do I need to operate one by one?
The text was updated successfully, but these errors were encountered: