Add more information about dataset #110

IcyFeather233 · 2024-06-27T14:06:01Z

In testenv.yaml configuration, train_url and test_url is configured.

But no document is about how these two data files should be prepared.

For example, what data format is needed? I think if there is a example, it will be better for people new to this project to use.

The document should cover TXT, CSV, JSON, which is in load data class

The text was updated successfully, but these errors were encountered:

IcyFeather233 · 2024-06-27T15:37:42Z

By the way, I think the JSONDataParse class is not standard, it seems that it's only used for coco, which is a image dataset, I think it should be a universal json data parse class:


class JSONDataParse(BaseDataSource, ABC):
    """
    json file which contain Structured Data parser
    """

    def __init__(self, data_type, func=None):
        super(JSONDataParse, self).__init__(data_type=data_type, func=func)

    def parse(self, *args, **kwargs):
        DIRECTORY = "train"
        LABEL_PATH = "*/gt/gt_val_half.txt"
        filepath = Path(*args)
        self.data_dir = Path(Path(filepath).parents[1], DIRECTORY)
        self.coco = COCO(filepath)
        self.ids = self.coco.getImgIds()
        self.class_ids = sorted(self.coco.getCatIds())
        self.annotations = [self.load_anno_from_ids(_ids) for _ids in self.ids]
        self.x = {"data_dir": self.data_dir, "coco": self.coco, "ids": self.ids, "class_ids": self.class_ids, "annotations": self.annotations}
        self.y = [f for f in self.data_dir.glob(LABEL_PATH)]

    def load_anno_from_ids(self, id_):
        im_ann = self.coco.loadImgs(id_)[0]
        width = im_ann["width"]
        height = im_ann["height"]
        frame_id = im_ann["frame_id"]
        video_id = im_ann["video_id"]
        anno_ids = self.coco.getAnnIds(imgIds=[int(id_)], iscrowd=False)
        annotations = self.coco.loadAnns(anno_ids)
        objs = []
        for obj in annotations:
            x1 = obj["bbox"][0]
            y1 = obj["bbox"][1]
            x2 = x1 + obj["bbox"][2]
            y2 = y1 + obj["bbox"][3]
            if obj["area"] > 0 and x2 >= x1 and y2 >= y1:
                obj["clean_bbox"] = [x1, y1, x2, y2]
                objs.append(obj)

        num_objs = len(objs)
        res = np.zeros((num_objs, 6))

        for ix, obj in enumerate(objs):
            cls = self.class_ids.index(obj["category_id"])
            res[ix, 0:4] = obj["clean_bbox"]
            res[ix, 4] = cls
            res[ix, 5] = obj["track_id"]

        file_name = im_ann["file_name"] if "file_name" in im_ann else "{:012}".format(id_) + ".jpg"
        img_info = (height, width, frame_id, video_id, file_name)

        del im_ann, annotations

        return (res, img_info, file_name)

MooreZheng added kind/feature Categorizes issue or PR as related to a new feature. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more information about dataset #110

Add more information about dataset #110

IcyFeather233 commented Jun 27, 2024

IcyFeather233 commented Jun 27, 2024

Add more information about dataset #110

Add more information about dataset #110

Comments

IcyFeather233 commented Jun 27, 2024

IcyFeather233 commented Jun 27, 2024