Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing an option for the user to decide on simplifying GADM shapes #1138

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

SermishaNarayana
Copy link

@SermishaNarayana SermishaNarayana commented Oct 10, 2024

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

  • I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@SermishaNarayana
Copy link
Author

Screenshot 2024-10-10 at 6 03 39 PM
@ekatef Here, I have plotted the difference in the shape files with and without simplifying the GADM shapes for the US. The differences lie mostly in the consideration of some small islanded lands and in the borders of the US states

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution :D added a comment, please also add a release note.
We are very close I believe :D

@@ -106,6 +106,7 @@ cluster_options:

build_shape_options:
gadm_layer_id: 1 # GADM level area used for the gadm_shapes. Codes are country-dependent but roughly: 0: country, 1: region/county-like, 2: municipality-like
simplify_gadm: false # When true, shape polygons are simplified else no
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @SermishaNarayana :D
What about we make this option numeric? Like we can rename it as simplify_tolerance, that by default it is 0.01 (current default value) and if the value is False or <=0, then the simplification does not occur?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f Yes, I am planning to add it and it is already in progress. I was trying to understand the reason for the explosion of the regions_onshore.geojson file without simplification in the meantime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SermishaNarayana @davide-f thanks a lot for taking care about that!

I'm afraid that it may not work with having a numerical option to simplify due to some _simplify_polys( ) function itself...

@jome1 did a great investigation on behaviour of _simplify_polys( ) which has demonstrated that all the polygons are being simplified independently of each other. That can lead to emerging a number of "enclaves" across the border of the regions.

I suspect explosion of the polygons you observed can be related to that: once we call _simplify_polys( ) it results in emerging of large amount of such enclaves because the geometry is quite complex. The good news is that the issue is likely to be resolved with the next release of shapely which should contain an improved simplification algorithm

So, I'd probably leave a boolean flag for now and returned to the idea to add a numeric parameter, once an advanced simplification option will be available in shapely. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekatef

If I understand it right, the problem you mention occurs when simplifying the polygon and using the _simplify_polys() function in build_shapes . But in the current observation of the explosion of regions_onshore.geojson filesize, the issue occurs when we have the GADM simplification turned off. The code in this case skips _simplify_polys(). Is the issue somehow still related then?

Please correct me if I understood it wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SermishaNarayana you are absolutely right, that is _simplify_polys( ) which creates issues in the #1051

Have misunderstood that you are observing the file expansion in case _simplify_polys( ) is bypassed. Not sure if that's really related to #1051, then. Thank you so much for the explanation!

Just to be sure that I get the problem: you are also observing some strange geometry effects, right? If that is the case could you please post a picture there? Would be great to understand what is going on with geometries

Copy link
Member

@davide-f davide-f Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SermishaNarayana can you share also the gadm_shapes and the whole shapes folder? simplify_polys apply on gadm_shapes first and with alternative_clustering I expected that to be used.
The output of bus_regions is regions_onshore; then other rules apply and they edit them further.
You may have found another bug later on down the chain, but it is very likely that it is not linked to this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f

Sure, I have attached all the files in the shape folder
shape_files.zip

Also, with respect to using a numerical option for tolerance, I am planning to raise it as a separate PR. Is that alright? And also to clarify (a tolerance value of 0 has the same issues as not allowing the GADM shapes to be simplified )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gadm_shapes here are fine, so this PR should be ok. The problem is likely later on.

Let's also keep the flag true/false and open an issue for custom tolerances and minarea.

Feel free to turn this PR as ready for review whenever ready. Please add a line in release note :)

Copy link
Author

@SermishaNarayana SermishaNarayana Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f Apologies, The files I had uploaded earlier were the ones with simplify_gadm option turned ON. I am attaching another set of files with simplify_gadm option turned OFF.

shape_files_non_simplified.zip

Can you please verify if these files look alright?
I shall add the release note :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! At first glance it looks good .
Here we have a great comparison and justification on why to add custom tolerances.
The feedback I see is the following:

  1. the number of shapes in the two dataset is the same: 55 shapes in both. This means that the shapes are conserved.
  2. the "simplified" one is 0.7Mb, while the non-simplified is 73 Mb. This is a good reason on why to use simplification. For some applications, however, the default value may be too much and lower simplication may be accepted.

The shapes are consistent and the bug you found later on in the simplification is not visible here. This suggests that the PR is ready to go functionality-wise :)

@SermishaNarayana SermishaNarayana marked this pull request as ready for review October 17, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants