Skip to content
This repository has been archived by the owner on Dec 19, 2024. It is now read-only.

Commit

Permalink
Fixed notebook, dataset, docker config, removed yacs (#154)
Browse files Browse the repository at this point in the history
* fixed notebook, dataset, docker, yas

* Cleand notebook

* updated readme

* fixed docs

* fixed review comments

* fixed linting

* updated readme

* updated readme

* updated readme tested python version

Co-authored-by: “sanjay” <“[email protected]”>
  • Loading branch information
86sanj and “sanjay” authored Mar 2, 2021
1 parent 4b63eb0 commit 8b34c43
Show file tree
Hide file tree
Showing 11 changed files with 282 additions and 138 deletions.
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ RUN apt-get update \
&& ln -s /usr/bin/python3.7 /usr/local/bin/python

# Pin setuptools to 49.x.x until this [issue](https://github.com/pypa/setuptools/issues/2350) is fixed.
RUN python -m pip install --upgrade pip poetry==1.0.10 setuptools==49.6.0
RUN python -m pip install --upgrade pip poetry==1.0.10 setuptools==49.6.0 -U pip cryptography==3.3.2
# pin cryptography to 3.3.2 until this (https://github.com/Azure/azure-cli/issues/16858) is fixed.

# Add Tini
ENV TINI_VERSION v0.18.0
Expand Down
29 changes: 22 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,45 @@
# Dataset Insights

Unity Dataset Insights is a python package for understanding synthetic datasets.
This package enables users to analyze synthetic datasets generated using the [Perception SDK](https://github.com/Unity-Technologies/com.unity.perception).
User can download the data, parse the metadata and analyze on the notebook.
Unity Dataset Insights is a python package for downloading, parsing and analyzing synthetic datasets generated using the Unity [Perception package](https://github.com/Unity-Technologies/com.unity.perception).

## Installation

Dataset Insights maintains a pip package for easy installation. It can work in any standard Python environment using `pip install datasetinsights` command. We support Python 3 (>= 3.7).
Dataset Insights maintains a pip package for easy installation. It can work in any standard Python environment using `pip install datasetinsights` command. We support Python 3 (3.7 and 3.8).

## Getting Started

### Dataset Statistics

We provide a sample [notebook](notebooks/SynthDet_Statistics.ipynb) to help you get started with dataset statistics for the [SynthDet](https://github.com/Unity-Technologies/SynthDet) project. We plan to support other sample Unity projects in the future.
We provide a sample [notebook](notebooks/Perception_Statistics.ipynb) to help you load synthetic datasets generated using [Perception package](https://github.com/Unity-Technologies/com.unity.perception) and visualize dataset statistics. We plan to support other sample Unity projects in the future.

### Dataset Download

Dataset download provides tools to download datasets from HTTP(s), GCS and Unity simulation project . You can run `download` command:
You can download the datasets from HTTP(s), GCS, and Unity simulation projects using the 'download' command from CLI or API.

[Download Dataset](https://datasetinsights.readthedocs.io/en/latest/datasetinsights.commands.html#datasetinsights-commands-download)
[CLI](https://datasetinsights.readthedocs.io/en/latest/datasetinsights.commands.html#datasetinsights-commands-download)

```bash
datasetinsights download \
--source-uri=<xxx> \
--output=$HOME/data
```
[Programmatically](https://datasetinsights.readthedocs.io/en/latest/datasetinsights.io.downloader.html#module-datasetinsights.io.downloader.gcs_downloader)

```python3

from datasetinsights.io.downloader import UnitySimulationDownloader,
GCSDatasetDownloader, HTTPDatasetDownloader

downloader = UnitySimulationDownloader(access_token=access_token)
downloader.download(source_uri=source_uri, output=data_root)

downloader = GCSDatasetDownloader()
downloader.download(source_uri=source_uri, output=data_root)

downloader = HTTPDatasetDownloader()
downloader.download(source_uri=source_uri, output=data_root)

```

## Docker

Expand Down
13 changes: 0 additions & 13 deletions datasetinsights/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,13 @@
TIMESTAMP_SUFFIX = datetime.now().strftime("%Y%m%d-%H%M%S")
PROJECT_ROOT = os.path.dirname(os.path.dirname(__file__))


GCS_BUCKET = "thea-dev"
GCS_BASE_STR = "gs://"
HTTP_URL_BASE_STR = "http://"
HTTPS_URL_BASE_STR = "https://"
LOCAL_FILE_BASE_STR = "file://"

# This is a hack on yacs config system, as it does not allow null values
# in configs. They are working on supporting null values in config
# https://github.com/rbgirshick/yacs/pull/18.
NULL_STRING = "None"

# Root directory of all datasets
# We assume the datasets are stored in the following structure:
# data_root/
# cityscapes/
# kitti/
# nuscenes/
# synthetic/
# ...
DEFAULT_DATA_ROOT = "/data"
SYNTHETIC_SUBFOLDER = "synthetic"

Expand Down
2 changes: 2 additions & 0 deletions datasetinsights/datasets/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
class DatasetNotFoundError(Exception):
""" Raise when a dataset file can't be found."""
192 changes: 192 additions & 0 deletions datasetinsights/datasets/synthetic.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,25 @@
""" Simulation Dataset Catalog
"""

import fcntl
import glob
import logging
import os
import shutil
from pathlib import Path

from PIL import Image
from pyquaternion import Quaternion

from datasetinsights.datasets.unity_perception import (
AnnotationDefinitions,
Captures,
)
from datasetinsights.datasets.unity_perception.tables import SCHEMA_VERSION
from datasetinsights.io.bbox import BBox2D, BBox3D

from .exceptions import DatasetNotFoundError

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -72,3 +85,182 @@ def read_bounding_box_2d(annotation, label_mappings=None):
bboxes.append(box)

return bboxes


class SynDetection2D:
"""Synthetic dataset for 2D object detection.
During the class instantiation, it would check whether the data files
such as annotations.json, images.png are present, if not it'll check
whether a compressed dataset file is present which contains the necessary
files, if not it'll raise an error.
See synthetic dataset schema documentation for more details.
<https://datasetinsights.readthedocs.io/en/latest/Synthetic_Dataset_Schema.html>
Attributes:
catalog (list): catalog of all captures in this dataset
transforms: callable transformation that applies to a pair of
capture, annotation. Capture is the information captured by the
sensor, in this case an image, and annotations, which in this
dataset are 2d bounding box coordinates and labels.
label_mappings (dict): a dict of {label_id: label_name} mapping
"""

ARCHIVE_FILE = "SynthDet.zip"
SUBFOLDER = "synthetic"

def __init__(
self,
*,
data_path=None,
transforms=None,
version=SCHEMA_VERSION,
def_id=4,
**kwargs,
):
"""
Args:
data_path (str): Directory of the dataset
transforms: callable transformation that applies to a pair of
capture, annotation.
version(str): synthetic dataset schema version
def_id (int): annotation definition id used to filter results
"""
self._data_path = self._preprocess_dataset(data_path)

captures = Captures(self._data_path, version)
annotation_definition = AnnotationDefinitions(self._data_path, version)
catalog = captures.filter(def_id)
self.catalog = self._cleanup(catalog)
init_definition = annotation_definition.get_definition(def_id)
self.label_mappings = {
m["label_id"]: m["label_name"] for m in init_definition["spec"]
}

self.transforms = transforms

def __getitem__(self, index):
"""
Get the image and corresponding bounding boxes for that index
Args:
index:
Returns (Tuple(Image,List(BBox2D))): Tuple comprising the image and
bounding boxes found in that image with transforms applied.
"""
cap = self.catalog.iloc[index]
capture_file = cap.filename
ann = cap["annotation.values"]

capture = Image.open(os.path.join(self._data_path, capture_file))
capture = capture.convert("RGB") # Remove alpha channel
annotation = read_bounding_box_2d(ann, self.label_mappings)

if self.transforms:
capture, annotation = self.transforms(capture, annotation)

return capture, annotation

def __len__(self):
return len(self.catalog)

def _cleanup(self, catalog):
"""
remove rows with captures that having missing files and remove examples
which have no annotations i.e. an image without any objects
Args:
catalog (pandas dataframe):
Returns: dataframe without rows corresponding to captures that have
missing files and removes examples which have no annotations i.e. an
image without any objects.
"""
catalog = self._remove_captures_with_missing_files(
self._data_path, catalog
)
catalog = self._remove_captures_without_bboxes(catalog)

return catalog

@staticmethod
def _remove_captures_without_bboxes(catalog):
"""Remove captures without bounding boxes from catalog
Args:
catalog (pd.Dataframe): The loaded catalog of the dataset
Returns:
A pandas dataframe with empty bounding boxes removed
"""
keep_mask = catalog["annotation.values"].apply(len) > 0

return catalog[keep_mask]

@staticmethod
def _remove_captures_with_missing_files(root, catalog):
"""Remove captures where image files are missing
During the synthetic dataset download process, some of the files might
be missing due to temporary http request issues or url corruption.
We should remove these captures from catalog so that it does not
stop the training pipeline.
Args:
catalog (pd.Dataframe): The loaded catalog of the dataset
Returns:
A pandas dataframe of the catalog with missing files removed
"""

def exists(capture_file):
path = Path(root) / capture_file

return path.exists()

keep_mask = catalog.filename.apply(exists)

return catalog[keep_mask]

@staticmethod
def _preprocess_dataset(data_path):
""" Preprocess dataset inside data_path and un-archive if necessary.
Args:
data_path (str): Path where dataset is stored.
Return:
Path of the dataset files.
"""
archive_file = Path(data_path) / SynDetection2D.ARCHIVE_FILE
if archive_file.exists():
file_descriptor = os.open(archive_file, os.O_RDONLY)

try:
fcntl.flock(file_descriptor, fcntl.LOCK_EX)

unarchived_path = Path(data_path) / SynDetection2D.SUBFOLDER
if not SynDetection2D.is_dataset_files_present(unarchived_path):
shutil.unpack_archive(
filename=archive_file, extract_dir=unarchived_path
)

return unarchived_path
finally:
os.close(file_descriptor)
elif SynDetection2D.is_dataset_files_present(data_path):
# This is for dataset generated by unity simulation.
# In this case, all data are downloaded directly in the data_path
return data_path
else:

raise DatasetNotFoundError(
f"Expecting a file {archive_file} under {data_path} or files "
f"directly exist under {data_path}"
)

@staticmethod
def is_dataset_files_present(data_path):
return os.path.isdir(data_path) and any(glob.glob(f"{data_path}/**/*"))
18 changes: 0 additions & 18 deletions docs/source/datasetinsights.datasets.dummy.rst

This file was deleted.

18 changes: 0 additions & 18 deletions docs/source/datasetinsights.datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,8 @@ datasetinsights.datasets
.. toctree::
:maxdepth: 2

datasetinsights.datasets.dummy
datasetinsights.datasets.unity_perception


datasetinsights.datasets.base
-----------------------------

.. automodule:: datasetinsights.datasets.base
:members:
:undoc-members:
:show-inheritance:

datasetinsights.datasets.coco
-----------------------------

.. automodule:: datasetinsights.datasets.coco
:members:
:undoc-members:
:show-inheritance:

datasetinsights.datasets.exceptions
-----------------------------------

Expand Down
Loading

0 comments on commit 8b34c43

Please sign in to comment.