Add datasets_root to training config #168
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By default, the full path to the sample images of the created datasets are the concatenation between the parent of the JSONL file, a subfolder named "arxiv", and the image path indicated in the JSONL file. For example, if the dataset path is "/path/to/exp_folder/train.jsonl", and the path to the first sample is "sample_paper/01.png", then the sample full path will be "/path/to/exp_folder/arxiv/sample_paper/01.png"
However, this root subfolder name "arxiv" is not indicated in the datasets creation tutorial in the README (instead we have "path/paired/output" or "images"), so when I tried to run the
train.py
script with my samples in a subfolder called "folder_paired", I got an error.This PR enables the user to choose any subfolder name as "datasets_root" in the training config file.
I'm wondering if that's all that was implied in this TODO ?
nougat/nougat/utils/dataset.py
Line 227 in 47c77d7