Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support Encord datasets #31

Merged
merged 10 commits into from
Feb 27, 2024
Merged

Conversation

eloy-encord
Copy link
Contributor

Add support for Encord datasets.

Still a WIP because while the dataset class returns what should be the expected data for the data loader is probably not in the right format (which need testing and posterior fix).

All feedback is welcomed 😊! This PR is going to be in draft until the tests are ✅ .

clip_eval/dataset/encord_utils.py Outdated Show resolved Hide resolved
clip_eval/dataset/encord_dataset.py Outdated Show resolved Hide resolved
clip_eval/dataset/encord_utils.py Outdated Show resolved Hide resolved
clip_eval/dataset/encord_dataset.py Outdated Show resolved Hide resolved
@Jim-Encord
Copy link
Contributor

Bugs:
I've tried using two different datasets in Encord rather than rsicd so as to use something smaller.
If I use the crab or accordion dataset (which we all have access too), I get:

File "/Users/encord/Documents/cord/text-to-image-eval/.venv/lib/python3.11/site-packages/encord/http/bundle.py", line 179, in bundled_operation
    assert len(result) == 1, "Expected a singular response for a singular request!"

whilst downloading the dataset from Encord.
I can make a PR including this dataset, so we can test on that.

Still needs to be faster for dataset with lots of individual images.
We should fix the transform as it generates ugly code in the dataset side.
@eloy-encord
Copy link
Contributor Author

@Jim-Encord I'm not sure where that error came from but the dataset output wasn't compatible with the model's input. Now with my latest fix it should run. At least I have run one complete dataset and it's ok. Still downloading the example one (rsicd) and will leave it running all night as it's super slow in the local.

@eloy-encord eloy-encord marked this pull request as ready for review February 23, 2024 16:43
Copy link
Contributor

@Jim-Encord Jim-Encord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Have tested it with one other Classification dataset but probably want Frederik to look at it also

@eloy-encord eloy-encord merged commit b29cb61 into main Feb 27, 2024
1 check passed
@eloy-encord eloy-encord deleted the eloy/feat-support-encord-datasets branch February 27, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants