Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

473 add training data tool #481

Merged
merged 15 commits into from
Mar 25, 2025

Conversation

okolekar
Copy link
Contributor

Added the code to convert data points to raster. The base raster values will be 0 and the points will have the value as per the attribute selected.

In this commit I have added the argument to the points_to_raster.py as per the user, value of the points will be displayed on the raster.
@nmaarnio
Copy link
Collaborator

Hi @okolekar , what's the status of this PR – is it ready for review?

@okolekar
Copy link
Contributor Author

okolekar commented Jan 27, 2025

There are some changes required.

@okolekar okolekar marked this pull request as draft January 27, 2025 09:38
@okolekar okolekar marked this pull request as ready for review January 31, 2025 08:23
@okolekar
Copy link
Contributor Author

Hi Niko,
the PR is ready for review.
Thankyou for your time and support.

Copy link
Collaborator

@nmaarnio nmaarnio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @okolekar . Good work with this one, looks like a well-designed tool! However, the points_to_raster tool seems to me a lot like our rasterize tool. The rasterize tool allows also users to buffer around points, so I wonder what exactly do we want out of this tool that rasterize does not handle? I might be just missing something here. I did not yet have time to look into random_sampling.py, I will do that too soon hopefully

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be removed from the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the file to the git ignore.

Comment on lines 83 to 103
@beartype
def save_raster(path: str, array: np.ndarray, meta: dict = None, overwrite: bool = False):
"""Save the given raster NumPy array and metadata in a raster format at the provided path.

Args:
path: Path to store the raster.
array: Raster NumPy array.
meta: Raster metadata.
overwrite: overwrites the existing raster file if present, default false.
"""

if os.path.exists(path) and overwrite is False:
print(f"File already exists: {os.path.basename(path)}.")
return
else:
if array.ndim == 2:
array = np.expand_dims(array, axis=0)

with rasterio.open(path, "w", compress="lzw", **meta) as dst:
dst.write(array)
dst.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be removed as it does not relate to the implemented feature itself

Comment on lines 106 to 140
def _point_to_raster(raster_array, raster_meta, positives, attribute, radius, buffer):
with MemoryFile() as memfile:
raster_meta["driver"] = "GTiff"
with memfile.open(**raster_meta) as datawriter:
datawriter.write(raster_array, 1)

with memfile.open() as memraster:
if not check_matching_crs(
objects=[memraster, positives],
):
raise NonMatchingCrsException("The raster and geodataframe are not in the same CRS.")

# Select only positives that are within raster bounds
positives = positives.cx[
memraster.bounds.left : memraster.bounds.right, # noqa: E203
memraster.bounds.bottom : memraster.bounds.top, # noqa: E203
]

if attribute is not None:
values = positives[attribute]
else:
values = [1]

positives_rows, positives_cols = rasterio.transform.rowcol(
memraster.transform, positives.geometry.x, positives.geometry.y
)
raster_array[positives_rows, positives_cols] = values

unique_values = list(set(values))

if radius is not None:
for target_value in unique_values:
raster_array = _create_buffer_around_labels(raster_array, radius, target_value, buffer)

return raster_array, raster_meta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor this function and leave out the MemoryFile stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi niko,
The MemoryFile stuff helps selecting the correct points form the geodataframe.
I tried to achieve the same output using the transform from the raster meta data however I could not achieve the same values of the raster bounds as I get it from the memory file. There is always some error associated.
Should I still remove the memoryfile?

Comment on lines 145 to 158
positives: geopandas.GeoDataFrame,
attribute: Optional[str] = None,
radius: Optional[int] = None,
buffer: Optional[str] = None,
template_raster: Optional[rasterio.io.DatasetReader] = None,
coord_west: Optional[Number] = None,
coord_north: Optional[Number] = None,
coord_east: Optional[Number] = None,
coord_south: Optional[Number] = None,
target_epsg: Optional[int] = None,
target_pixel_size: Optional[int] = None,
raster_width: Optional[int] = None,
raster_height: Optional[int] = None,
nodata_value: Optional[Number] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would harmonize the inputs of this function to match how we have implemented other EIS Toolkit tools. That is, when a base raster is needed, the user needs to give raster profile / metadata and the base raster is constructed from that, see e.g. distance_computation implementation. We will provide the user the option to give either a base raster or define pixel size + extent in QGIS if they wish to do so. So I suggest us to remove all parameters from template_raster up until nodata_value and simply put raster_profile parameter there instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ways to define a base raster you have defined here are anyway available as create_constant_raster_... tools separately

the desired number of pixels for rows and columns.

Args:
positives: The geodataframe points set to be converted into raster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first parameter could be more generic like deposits or just geodataframe – unless this tool is meant just for positive point sample, in which case I think the tool's and Python module's names should reflect this more precisely.

@okolekar
Copy link
Contributor Author

Hi @okolekar . Good work with this one, looks like a well-designed tool! However, the points_to_raster tool seems to me a lot like our rasterize tool. The rasterize tool allows also users to buffer around points, so I wonder what exactly do we want out of this tool that rasterize does not handle? I might be just missing something here. I did not yet have time to look into random_sampling.py, I will do that too soon hopefully

This tool allows us to select the radius size for the buffer and also allows to select the type of buffer (min max or average) In addition to the above mentioned points it would also be possible (in the future) to select the shape of the buffer to be applied like square or circular if requested.

Copy link
Collaborator

@nmaarnio nmaarnio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Omkar, I took another look at this and I'd just make a small modification to one parameter type. Other than that, one test seems to be failing and the docs files missing (that go here) but after they are fixed, we can merge this!

raster_profile: Union[profiles.Profile, dict],
attribute: Optional[str] = None,
radius: Optional[int] = None,
buffer: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend making this parameter a Literal type, since only certain values are accepted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Niko,
I have made the requested changes!

Copy link
Collaborator

@nmaarnio nmaarnio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, merging!

@nmaarnio nmaarnio merged commit 82bc314 into GispoCoding:master Mar 25, 2025
2 of 4 checks passed
@nmaarnio nmaarnio linked an issue Mar 25, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New tool] - training data tool required
2 participants