Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading from S3, manifest.jsonl file needs to be in the same location as the image data #8077

Open
astringfield opened this issue Jun 25, 2024 · 0 comments
Assignees

Comments

@astringfield
Copy link

A summary of my use-case:

  1. I'm trying to upload data from S3 to a local CVAT instance running in Docker
  2. I'm using the CVAT CLI
  3. I've created and verified manifest.jsonl file

Question

In both cases, I specify the S3 prefix path to where the images are stored, however, the command only works if the manifest is stored in the same S3 location as the image data. If the manifest is elsewhere in S3, the upload fails. Below I've included examples of successful and unsuccessful uploads to illustrate the problem with a concrete example.

Is this behaviour expected, or, is there a way to upload to CVAT from S3 with the manifest file stored separately from the images? I really appreciate any help you can provide.

manifest.jsonl in the same S3 location as images

When I run the command with the manifest.jsonl file stored in the same location in S3 as the images, the upload is successful:

# Command
cvat-cli --auth <cvat_username>:<cvat_password> \
    --server-host http://localhost \
    --server-port 8080 \
    --organization <org_name> \
    create "<task_name>" --use_cache \
    --project_id <proj_id> \
    --annotation_path "/path/to/local/annotations.json" \
    --annotation_format "COCO 1.0" \
    --cloud_storage_id <cloud_id> \
    --filename_pattern "path/to/images/on/s3/*.png" \
    share path/to/images/on/s3/manifest.jsonl

# Output (success)
[2024-06-25 15:46:33] INFO: Created task ID: 227 NAME: <task_name>
[2024-06-25 15:46:33] INFO: Awaiting for task 227 creation...
[2024-06-25 15:46:35] INFO: Task 227 creation status: Finished (message=)
Uploading data: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 205M/205M [00:01<00:00, 158MB/s]
[2024-06-25 15:46:52] INFO: Annotation file '/path/to/local/annotations.json' for task #227 uploaded
Created task id 227

manifest.jsonl in a different S3 location from images

However, when I run the command with the manifest.jsonl file stored in a different location in S3 from the images, the upload results in error:
When I run the command with the manifest.jsonl file stored in the same location in S3 as the images, the upload is successful:

# Command
cvat-cli --auth <cvat_username>:<cvat_password> \
    --server-host http://localhost \
    --server-port 8080 \
    --organization <org_name> \
    create "<task_name>" --use_cache \
    --project_id <proj_id> \
    --annotation_path "/path/to/local/annotations.json" \
    --annotation_format "COCO 1.0" \
    --cloud_storage_id <cloud_id> \
    --filename_pattern "path/to/images/on/s3/*.png" \
    share a/different/location/on/s3/manifest.jsonl

# Output (error)
[2024-06-25 15:44:54] INFO: Created task ID: 225 NAME: <task_name>
[2024-06-25 15:44:54] INFO: Awaiting for task 225 creation...
[2024-06-25 15:44:56] INFO: Task 225 creation status: Failed (message=Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
    rv = job.perform()
  File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
    self._result = self._execute()
  File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/django/cvat/apps/engine/task.py", line 646, in _create_thread
    media, task_mode = _validate_data(media, manifest_files)
  File "/home/django/cvat/apps/engine/task.py", line 260, in _validate_data
    raise ValueError('No media data found')
ValueError: No media data found)
[2024-06-25 15:44:56] CRITICAL: Status Code: 200
Reason: OK
HTTP response headers: HTTPHeaderDict({'Allow': 'GET, HEAD, OPTIONS', 'Content-Length': '846', 'Content-Type': 'application/vnd.cvat+json', 'Cross-Origin-Opener-Policy': 'same-origin', 'Date': 'Tue, 25 Jun 2024 05:44:56 GMT', 'Referrer-Policy': 'same-origin, strict-origin-when-cross-origin', 'Server': 'nginx', 'Vary': 'Accept, Accept-Encoding, Origin, Cookie', 'X-Content-Type-Options': 'nosniff, nosniff', 'X-Frame-Options': 'DENY, deny', 'X-Request-Id': 'c8ebf596-82bd-4bee-8f45-3583a247db8e'})
HTTP response body: b'{"state":"Failed","message":"Traceback (most recent call last):\\n  File \\"/opt/venv/lib/python3.10/site-packages/rq/worker.py\\", line 1431, in perform_job\\n    rv = job.perform()\\n  File \\"/opt/venv/lib/python3.10/site-packages/rq/job.py\\", line 1280, in perform\\n    self._result = self._execute()\\n  File \\"/opt/venv/lib/python3.10/site-packages/rq/job.py\\", line 1317, in _execute\\n    result = self.func(*self.args, **self.kwargs)\\n  File \\"/usr/lib/python3.10/contextlib.py\\", line 79, in inner\\n    return func(*args, **kwds)\\n  File \\"/home/django/cvat/apps/engine/task.py\\", line 646, in _create_thread\\n    media, task_mode = _validate_data(media, manifest_files)\\n  File \\"/home/django/cvat/apps/engine/task.py\\", line 260, in _validate_data\\n    raise ValueError(\'No media data found\')\\nValueError: No media data found","progress":0.0}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants