Skip to content

Feat/abdul/yolo#27

Open
AbdelrahmanKatkat wants to merge 36 commits intomasterfrom
feat/abdul/yolo
Open

Feat/abdul/yolo#27
AbdelrahmanKatkat wants to merge 36 commits intomasterfrom
feat/abdul/yolo

Conversation

@AbdelrahmanKatkat
Copy link
Copy Markdown
Contributor

Testing CI pipeline

AbdelrahmanKatkat and others added 14 commits March 2, 2026 00:03
…files, and tests

- Introduced YOLOv8-v1 and YOLOv8-v2 for building footprint segmentation.
- Added ZenML pipelines for training and inference.
- Created Dockerfiles for isolated runtime environments.
- Implemented comprehensive smoke tests to validate functionality.
- Updated .gitignore to include new sample data directories.
…from STAC

- Added functions to load model weights and hyperparameters from STAC Item JSON files.
- Updated preprocess and training_pipeline functions to utilize loaded hyperparameters.
- Enhanced stac-item.json files for both YOLOv8-v1 and YOLOv8-v2 with additional metadata and structure.
- Improved documentation for clarity on model configuration and usage.
- Deleted YOLOv8-v1 Dockerfile, pipeline, README, STAC item, and tests to streamline the model repository.
- Updated .gitignore to exclude new directories for runs and weights.
- Consolidated focus on YOLOv8-v2 for building footprint segmentation.
…artifact tracking

- Modified the run_preprocessing function to return a list of tuples containing image data and corresponding label data.
- Enhanced error handling in train_model to raise an error if the data loader is empty.
- Updated training_pipeline to accommodate the new data loader structure.
- Removed unnecessary line breaks and consolidated code for better clarity.
- Updated error message formatting for consistency.
- Minor adjustments in the test file for improved readability.
refactor(pipeline): streamline code formatting and improve readability
- Introduced a new step to split an existing YOLO dataset into train and validation sets.
- Implemented shuffling and validation fraction control via hyperparameters.
- Ensured proper directory structure and error handling for dataset integrity.
- Updated the training pipeline to include the dataset splitting step.
- Updated the `run_preprocessing` function to return a list of tuples containing image data and corresponding label data for ZenML artifact tracking.
- Added error handling to ensure the data loader is not empty before proceeding with model training.
- Adjusted the training pipeline to utilize the new data loader structure.
- Added a new function to resolve input directories for local and remote datasets.
- Updated the `preprocess` function to return the preprocessed directory path.
- Refactored the `split_dataset` function to generate YOLO train/val splits and return split metadata.
- Adjusted the smoke test to validate the new preprocessing and dataset splitting workflow.
- Added `split_seed` parameter to the configuration for reproducibility.
- Added a missing comma in the metadata dictionary returned by the split_dataset function for improved syntax correctness.
Comment thread models/yolo_v8_v2/tests/inside_container_smoke_test.py Outdated
Comment thread models/yolo_v8_v2/Dockerfile Outdated
Comment thread models/yolo_v8_v2/README.md Outdated
Copy link
Copy Markdown
Member

@kshitijrajsharma kshitijrajsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread models/yolo_v8_v2/pipeline.py Outdated
Comment thread models/yolo_v8_v2/pipeline.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

…put decoding for enhanced prediction functionality
…ure collection building, and prediction functions for improved clarity and performance
…ling with new image preprocessing and validation split features
patch("models.yolo_v8_segmentation.pipeline.log_metadata"),
):
metrics = evaluate_model.entrypoint(
trained_model=b"fake",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of mocking the model, the idea is to pass the model that was really trained on the toy data from above ! this test_steps should run the real model training followed with the evaluation and metrics being produced , mock should be removed

"metrics/recall(M)": 0.74,
}

class _MockModel:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove mock here as well ,
should be something like this

    split_info = split_dataset.entrypoint(
        dataset_chips=str(toy_chips),
        dataset_labels=str(toy_labels),
        hyperparameters=base_hyperparameters,
    )
    model = train_model.entrypoint(
        dataset_chips=str(toy_chips),
        dataset_labels=str(toy_labels),
        base_model_weights=pretrained_weights,
        hyperparameters=base_hyperparameters,
        split_info=split_info,
        num_classes=2,
    )
    metrics = evaluate_model.entrypoint(
        trained_model=model,
        dataset_chips=str(toy_chips),
        dataset_labels=str(toy_labels),
        hyperparameters=base_hyperparameters,
        split_info=split_info,
        num_classes=2,
    )

@@ -0,0 +1,84 @@
# syntax=docker/dockerfile:1.7

ARG BASE_IMAGE=ghcr.io/hotosm/fair-utilities-yolo:cpu-latest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu


ENV UV_LINK_MODE=copy

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /usr/local/bin/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uvx is not needed

# runtime and tests share one interpreter (do not use `--system` / /usr/local).
RUN --mount=type=cache,target=/root/.cache/uv \
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 \
uv pip install --python /app/.venv/bin/python \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu and i think they are already avvailable in the base image !

# ---------------------------------------------------------------------------
# Inference stage: runtime + serving deps for smoke/live API
# ---------------------------------------------------------------------------
FROM runtime AS inference
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use distroless images and only install the minimal inference , example here :

FROM gcr.io/distroless/python3-debian12:nonroot AS inference

rasterize=True,
rasterize_options=["binary"],
georeference_images=True,
multimasks=False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you pass epsg=4326, here in this function the above patch might not be required !

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why it was working before. When I didn't pass the crs it went to the other branch which caused the error. That is why i added the patch

Copy link
Copy Markdown
Member

@kshitijrajsharma kshitijrajsharma Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you can pass the 4326 ( this is strict because yolo conversion logic was written in degrees ) to make it strict ! conversion would be handled by code automatically

it went to the other branch
on this which branch ?

output_path=output_geojson,
remove_inputs=False,
)
except Exception as exc:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will return the empty polygon even though there is a bug in the polygonize which. should not be the case , the geojson that is being returned from this function should be empty ! this error handling will make us hard to catch the bug !

return json.load(f)


def _select_or_merge_labels(labels_path: Path, destination: Path) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the input is already a single geojson , its chips and the geojson ! so this function might be redundant ? why we would need to merge the labels that is already in a single geojson file ?

return input_dir


def _training_cache_dir(dataset_chips: str, dataset_labels: str) -> Path:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need cache for the training ? it just fills up the container as multiple training progresses !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants