Feat/abdul/yolo by AbdelrahmanKatkat · Pull Request #27 · hotosm/fAIr-models

AbdelrahmanKatkat · 2026-04-09T14:08:43Z

Testing CI pipeline

…files, and tests - Introduced YOLOv8-v1 and YOLOv8-v2 for building footprint segmentation. - Added ZenML pipelines for training and inference. - Created Dockerfiles for isolated runtime environments. - Implemented comprehensive smoke tests to validate functionality. - Updated .gitignore to include new sample data directories.

…from STAC - Added functions to load model weights and hyperparameters from STAC Item JSON files. - Updated preprocess and training_pipeline functions to utilize loaded hyperparameters. - Enhanced stac-item.json files for both YOLOv8-v1 and YOLOv8-v2 with additional metadata and structure. - Improved documentation for clarity on model configuration and usage.

- Deleted YOLOv8-v1 Dockerfile, pipeline, README, STAC item, and tests to streamline the model repository. - Updated .gitignore to exclude new directories for runs and weights. - Consolidated focus on YOLOv8-v2 for building footprint segmentation.

Feature/yolo

…artifact tracking - Modified the run_preprocessing function to return a list of tuples containing image data and corresponding label data. - Enhanced error handling in train_model to raise an error if the data loader is empty. - Updated training_pipeline to accommodate the new data loader structure.

…into feature/yolo

- Removed unnecessary line breaks and consolidated code for better clarity. - Updated error message formatting for consistency. - Minor adjustments in the test file for improved readability.

refactor(pipeline): streamline code formatting and improve readability

- Introduced a new step to split an existing YOLO dataset into train and validation sets. - Implemented shuffling and validation fraction control via hyperparameters. - Ensured proper directory structure and error handling for dataset integrity. - Updated the training pipeline to include the dataset splitting step.

- Updated the `run_preprocessing` function to return a list of tuples containing image data and corresponding label data for ZenML artifact tracking. - Added error handling to ensure the data loader is not empty before proceeding with model training. - Adjusted the training pipeline to utilize the new data loader structure.

- Added a new function to resolve input directories for local and remote datasets. - Updated the `preprocess` function to return the preprocessed directory path. - Refactored the `split_dataset` function to generate YOLO train/val splits and return split metadata. - Adjusted the smoke test to validate the new preprocessing and dataset splitting workflow. - Added `split_seed` parameter to the configuration for reproducibility.

- Added a missing comma in the metadata dictionary returned by the split_dataset function for improved syntax correctness.

Merge : Master

… into feat/abdul/yolo

kshitijrajsharma

rename yolov8v2 to yolo_v8_segmetation ! as we will not have multiple yolo version anymore
get rid of the bash scripts on tests and move tests to production test cases with pytest as defined in the instructions , check here : https://hotosm.github.io/fAIr-models/contributing/model/#testing and example here : https://github.com/hotosm/fAIr-models/tree/master/models/yolo11n_detection/tests , tests should validate each function defined in pipeline
Separate dockerfile to 3 stages , builder runtime and test : check here ; https://hotosm.github.io/fAIr-models/contributing/model/#dockerfile, with example here : https://github.com/hotosm/fAIr-models/blob/master/models/yolo11n_detection/Dockerfile
Slimdown readme.md to be userfriendly model card rather than development decisions , they can live in PR description
I haven't reviwed the pipeline yet , but the CI will validate the pipeline first and then i will have a look in near future !

…ing and update p_val parameter

…related documentation

…ne, and tests

…rovider info to stac-item.json

codecov · 2026-04-20T09:25:47Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

…st related code for consistency

…nfiguration

…nference stage

…check

…ters and improve output directory handling

…eturn valid empty FeatureCollection

…n to improve type clarity

…put decoding for enhanced prediction functionality

…ure collection building, and prediction functions for improved clarity and performance

…ling with new image preprocessing and validation split features

…ts and redundant mock classes

…adability

kshitijrajsharma · 2026-04-22T18:18:04Z

+        patch("models.yolo_v8_segmentation.pipeline.log_metadata"),
+    ):
+        metrics = evaluate_model.entrypoint(
+            trained_model=b"fake",


instead of mocking the model, the idea is to pass the model that was really trained on the toy data from above ! this test_steps should run the real model training followed with the evaluation and metrics being produced , mock should be removed

kshitijrajsharma · 2026-04-22T18:18:57Z

+            "metrics/recall(M)": 0.74,
+        }
+
+    class _MockModel:


lets remove mock here as well ,
should be something like this

split_info = split_dataset.entrypoint( dataset_chips=str(toy_chips), dataset_labels=str(toy_labels), hyperparameters=base_hyperparameters, ) model = train_model.entrypoint( dataset_chips=str(toy_chips), dataset_labels=str(toy_labels), base_model_weights=pretrained_weights, hyperparameters=base_hyperparameters, split_info=split_info, num_classes=2, ) metrics = evaluate_model.entrypoint( trained_model=model, dataset_chips=str(toy_chips), dataset_labels=str(toy_labels), hyperparameters=base_hyperparameters, split_info=split_info, num_classes=2, )

kshitijrajsharma · 2026-04-22T18:20:40Z

@@ -0,0 +1,84 @@
+# syntax=docker/dockerfile:1.7
+
+ARG BASE_IMAGE=ghcr.io/hotosm/fair-utilities-yolo:cpu-latest


kshitijrajsharma · 2026-04-22T18:21:07Z

+
+ENV UV_LINK_MODE=copy
+
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /usr/local/bin/


uvx is not needed

kshitijrajsharma · 2026-04-22T18:21:54Z

+# runtime and tests share one interpreter (do not use `--system` / /usr/local).
+RUN --mount=type=cache,target=/root/.cache/uv \
+    SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 \
+    uv pip install --python /app/.venv/bin/python \


gpu and i think they are already avvailable in the base image !

kshitijrajsharma · 2026-04-22T18:23:10Z

+# ---------------------------------------------------------------------------
+# Inference stage: runtime + serving deps for smoke/live API
+# ---------------------------------------------------------------------------
+FROM runtime AS inference


use distroless images and only install the minimal inference , example here :

fAIr-models/models/yolo11n_detection/Dockerfile

Line 71 in 0dd1a46

FROM gcr.io/distroless/python3-debian12:nonroot AS inference

kshitijrajsharma · 2026-04-22T18:28:33Z

+        rasterize=True,
+        rasterize_options=["binary"],
+        georeference_images=True,
+        multimasks=False,


if you pass epsg=4326, here in this function the above patch might not be required !

This is why it was working before. When I didn't pass the crs it went to the other branch which caused the error. That is why i added the patch

yes you can pass the 4326 ( this is strict because yolo conversion logic was written in degrees ) to make it strict ! conversion would be handled by code automatically

it went to the other branch
on this which branch ?

kshitijrajsharma · 2026-04-22T18:34:10Z

+            output_path=output_geojson,
+            remove_inputs=False,
+        )
+    except Exception as exc:


this will return the empty polygon even though there is a bug in the polygonize which. should not be the case , the geojson that is being returned from this function should be empty ! this error handling will make us hard to catch the bug !

kshitijrajsharma · 2026-04-22T18:37:26Z

+        return json.load(f)
+
+
+def _select_or_merge_labels(labels_path: Path, destination: Path) -> None:


but the input is already a single geojson , its chips and the geojson ! so this function might be redundant ? why we would need to merge the labels that is already in a single geojson file ?

kshitijrajsharma · 2026-04-22T18:38:48Z

+    return input_dir
+
+
+def _training_cache_dir(dataset_chips: str, dataset_labels: str) -> Path:


why do we need cache for the training ? it just fills up the container as multiple training progresses !

AbdelrahmanKatkat and others added 14 commits March 2, 2026 00:03

Merge branch 'master' into feature/yolo

c395a79

Merge pull request #22 from hotosm/feature/yolo

db47d07

Feature/yolo

Merge branch 'feature/yolo' of https://github.com/hotosm/fAIr-models …

57d432f

…into feature/yolo

refactor(pipeline): streamline code formatting and improve readability

95b566b

- Removed unnecessary line breaks and consolidated code for better clarity. - Updated error message formatting for consistency. - Minor adjustments in the test file for improved readability.

Merge pull request #24 from hotosm/yolo_ci_fix

8397616

refactor(pipeline): streamline code formatting and improve readability

Merge branch 'yolo_ci_fix' into feat/abdul/yolo

c76c07a

fix(pipeline): correct dataset_yaml formatting in split_dataset function

79735a8

- Added a missing comma in the metadata dictionary returned by the split_dataset function for improved syntax correctness.

AbdelrahmanKatkat requested a review from kshitijrajsharma April 12, 2026 19:31

Merge pull request #32 from hotosm/master

b695f5c

Merge : Master