Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

Analysing overlapping features in OSM using tile-reduce #145

Open
amishas157 opened this issue Apr 7, 2017 · 19 comments
Open

Analysing overlapping features in OSM using tile-reduce #145

amishas157 opened this issue Apr 7, 2017 · 19 comments

Comments

@amishas157
Copy link
Contributor

Ref : Currently we have a feature overlap comparator to flag all newly added (version 1) water bodies which overlaps with any of the existing features.

Following are the uncertainities discussed by @bkowshik in the referenced post.

What zoom level tiles should be downloaded from the API
Tile at lower zoom levels don't have all the data. Ex: Buildings generally show up in tiles greater than zoom 15
There could be good overlaps too. We need to differentiate between a good and a harmful overlap.
Lakes and parks features fit this use-case well. What other feature types can something like this handle?

Per discussion w/ @bkowshik ,to study more on above discussed issues, we can perform tile-reduce to find out existing overlapping features in OSM and visualize the different overlapping combination and count of such features. It would help us in getting a list of feature types which can help us tighten this compare function.

How I am seeing it is we will get a count as following:
Overlapping feature type | Overlapped feature type | Count

Here is the tile-reduce script I am working on. So this script finds out all waterbodies in a tile and then checks for other features in the same tile overlapped by it.

One issue in above process is we will miss counts from the relations as mb tiles used in tile-reduce doesn't contains relation type features.

Would be glad to feedbacks from the team.

cc @batpad @geohacker @planemad @lukasmartinelli @ian29

@geohacker
Copy link
Contributor

This is a great idea! OSM QA Tiles aren't the best for this research though because - no relations, tile edges, multipolygon handling.

@amishas157 Let's use osmium. Take a look at osmlazer for a sketch.

@amishas157
Copy link
Contributor Author

@geohacker Yes, that would be great. Now how I am thinking of approaching this problem is following:

Find the overlapping features with an area greater than a threshold value and find the different kind of overlapping and overlapped features. And then doing a deeper analysis on the combination of two.

@amishas157
Copy link
Contributor Author

amishas157 commented Apr 13, 2017

Analysing Feature Overlaps

Idea behind doing this analysis is to get an idea what kind of overlaps exist in OSM , get an estimate of the numbers for the same and also to differentiate a good overlaps from the bad ones. The example of good ones can be building overlapping with landuse feature but the bad ones can be water feature overlapping over buildings.

The above process is carried out as following:

  1. OsmLazer to convert .pbf files into a ouput json file. (The reason for considering .pbf files and not OSM QA tiles is because OSM QA tiles doesn't give relations) Only ways and relations are filtered from the pbf files. Relation members are indexed using member ids and this list is used further in data processing, which is explained in steps ahead. Features are also filtered based on presence of primary tags in the same and also we ignore the features which has layer tag in it. As layer = +ve/-ve number can give false positive for feature overlap because these actually don't make any bad overlaps.
Primary Tags List: 'aerialway', 'aeroway', 'amenity', 'barrier', 'boundary', 'building', 'craft', 'emergency'
, 'emergency', 'geological', 'highway', 'landuse', 'leisure', 'man_made', 'military', 'natural',
'office', 'man_made', 'places', 'power', 'public_transport', 'railway', 'route', 'shop', 'sport', 
'tourism','waterway'
  1. Tippecaone is used to cut json file into mbtiles with min and max zoom level as 16.

  2. Used Tile reduce script to process the features in above list. Overlaps are calculated as following:

    • Each item in feature list is checked against every other members in the list.
    • The two items which are checked , if they belong to the same relation, thes are ignored. Relation indexing mentioned in the above steps helps in doing this.
    • Overlaps are considered to be 3 types. Area feature w/ Area feature, Area feature w/ Line feature, Line feature w/ Line feature.
    • For the process, we have ignored the cases of intersection i.e. Line w/ Line features.
    • For Area w/ Area feature, if area of intersection is greater than a certain threshold, it is considered as an overlap.
    • For Area w/ Line feature, if line feature crosses the area features, then it is considered as an overlap. Line feature sharing boundaries with Area feature is not considered as an overlap.

Analysis for Monaco, a small city in Europe is done initially.

  • Total features having primary tags: 3192

Results from overlaps:

{ '{"building":"construction"},{"boundary":"administrative"}': 1,
 '{"building":"construction"},{"highway":"steps"}': 1,
 '{"building":"yes"},{"highway":"footway"}': 18,
 '{"leisure":"swimming_pool"},{"building":"yes"}': 4,
 '{"leisure":"park"},{"highway":"footway"}': 13,
 '{"leisure":"park"},{"highway":"service"}': 3,
 '{"leisure":"park"},{"landuse":"construction","building":"construction"}': 1,
 '{"building":"yes"},{"boundary":"administrative"}': 15,
 '{"building":"yes"},{"highway":"steps"}': 7,
 '{"sport":"swimming","amenity":"swimming_pool"},{"leisure":"swimming_pool"}': 2,
 '{"building":"yes"},{"highway":"service"}': 5,
 '{"building":"yes"},{"landuse":"residential"}': 5,
 '{"landuse":"residential"},{"highway":"footway"}': 2,
 '{"landuse":"residential"},{"highway":"service"}': 6,
 '{"highway":"footway"},{"natural":"water"}': 2,
 '{"leisure":"park"},{"highway":"steps"}': 2,
 '{"amenity":"fountain"},{"leisure":"park"}': 2,
 '{"building":"yes"},{"leisure":"park","tourism":"attraction"}': 2,
 '{"natural":"water"},{"leisure":"park","tourism":"attraction"}': 2,
 '{"leisure":"park","tourism":"attraction"},{"highway":"footway"}': 6,
 '{"leisure":"swimming_pool"},{"boundary":"administrative"}': 1,
 '{"leisure":"swimming_pool"},{"highway":"footway"}': 1,
 '{"highway":"secondary"},{"boundary":"administrative"}': 1,
 '{"leisure":"playground"},{"highway":"footway"}': 1,
 '{"highway":"primary"},{"boundary":"administrative"}': 1,
 '{"building":"residential"},{"boundary":"administrative"}': 1,
 '{"building":"residential"},{"highway":"primary"}': 1,
 '{"building":"residential"},{"highway":"footway"}': 1,
 '{"building":"residential"},{"highway":"service"}': 1,
 '{"building":"yes"},{"highway":"residential"}': 4,
 '{"man_made":"pier"},{"natural":"coastline"}': 1,
 '{"building":"public"},{"highway":"footway"}': 1,
 '{"building":"public"},{"boundary":"administrative"}': 1,
 '{"building":"public"},{"highway":"service"}': 1,
 '{"building":"yes"},{"landuse":"cemetery"}': 6,
 '{"building":"yes"},{"highway":"primary"}': 3,
 '{"landuse":"cemetery"},{"highway":"service"}': 3,
 '{"building":"apartments"},{"highway":"primary"}': 1,
 '{"building":"yes"},{"barrier":"retaining_wall"}': 1,
 '{"leisure":"miniature_golf"},{"leisure":"park"}': 1,
 '{"building":"yes"},{"highway":"pedestrian"}': 8,
 '{"highway":"pedestrian"},{"highway":"pedestrian"}': 7,
 '{"leisure":"garden"},{"highway":"footway"}': 22,
 '{"amenity":"toilets","building":"yes"},{"leisure":"garden"}': 1,
 '{"building":"yes"},{"leisure":"garden"}': 1,
 '{"highway":"tertiary"},{"leisure":"sports_centre"}': 1,
 '{"highway":"tertiary"},{"boundary":"administrative"}': 1,
 '{"highway":"footway","man_made":"pier"},{"building":"yes"}': 2,
 '{"building":"commercial"},{"highway":"footway"}': 16,
 '{"building":"commercial"},{"highway":"steps"}': 8,
 '{"building":"commercial"},{"highway":"pedestrian"}': 1,
 '{"highway":"footway"},{"highway":"pedestrian"}': 14,
 '{"highway":"pedestrian"},{"leisure":"park"}': 1,
 '{"highway":"footway"},{"highway":"footway"}': 2,
 '{"highway":"pedestrian"},{"boundary":"administrative"}': 1,
 '{"highway":"pedestrian"},{"highway":"secondary"}': 1,
 '{"highway":"pedestrian"},{"highway":"service"}': 1,
 '{"amenity":"police","building":"yes"},{"highway":"residential"}': 1,
 '{"building":"yes"},{"leisure":"park"}': 1,
 '{"amenity":"shelter"},{"leisure":"park"}': 1,
 '{"leisure":"swimming_pool"},{"leisure":"garden"}': 1,
 '{"building":"yes"},{"amenity":"school"}': 2,
 '{"amenity":"school"},{"highway":"footway"}': 1,
 '{"leisure":"swimming_pool"},{"tourism":"hotel","building":"yes"}': 1,
 '{"building":"hangar"},{"aeroway":"heliport"}': 2,
 '{"aeroway":"helipad"},{"aeroway":"heliport"}': 8,
 '{"aeroway":"terminal","building":"yes"},{"aeroway":"heliport"}': 1,
 '{"aeroway":"heliport"},{"highway":"service"}': 1,
 '{"building":"yes"},{"aeroway":"heliport"}': 1 }

The first JSON object represents primary tags present in feature1 and second represents primary tags present in feature 2. The third parameter gives the list of count of such overlaps.

In the above case: highest overlaps is found with building overlapping with highways:footway. But a harmful combination which is found is {"building":"yes"},{"highway":"primary"}. Need to do more analysis on above and see what all is happening in OSM.

Next action is to perform the same process for a large city and see what all comes out.

cc @bkowshik @geohacker @batpad

@lukasmartinelli
Copy link
Contributor

Wow! Awesome analysis Amisha!

@batpad
Copy link
Contributor

batpad commented Apr 14, 2017

@amishas157 💥 ! this analysis is amazing.

Is it possible to list down what next actions look like to you here?

@amishas157
Copy link
Contributor Author

amishas157 commented Apr 14, 2017

Next actions:

  • To discuss these results with a member from validation team and segregate the good overlaps and the bad ones.
  • To do the similar analysis for North America. This process is running now in background in a morec2 machine.

@bkowshik
Copy link
Contributor

Cleaned up the JSON @amishas157 posted ^ into a csv for 👀 better


It is super-interesting to see footway features overlap with so many other features. Out of the total 70 rows, there are 16 rows with a footway feature either in the first or the second column.

The highest overlap of 22 between garden and footway makes sense right? There are lots of footway in garden features.

@amishas157
Copy link
Contributor Author

Here is the updated JSON object. https://gist.github.com/amishas157/ec0f042d7e69a576a337d156742547f5 after removal of few dups and improving a bit of logic.

Thanks @bkowshik for the CSV. 🙇‍♀️

@bkowshik

The highest overlap of 22 between garden and footway makes sense right? There are lots of footway in garden features.

Yes, correct. But this seems to be a legit overlap kind no 🤔 ?

@bkowshik
Copy link
Contributor

Per voice with @manoharuss and @amishas157


Priority

  • Focus on overlaps between building, water and highway
  • For highway's using residential and up are a good start.
  • Any highway overlap with other highway can be ignored.

Rendering

  • This is very important!!!
  • Water is generally among the top layers, thus when a building overlaps with water, water gets rendered on the map.

Percentage of overlap

  • TODO: To explore later
  • What should be the threshold/percentage of overlap?
  • Ex: 100% of footway was overlapped by building but 10% of building is overlapping the footway.
  • A lake is overlapped by just 1 building with an overlap of 10% vs overlapped with 20 buildings with an overlap of 80%

Noise

  • Administrative boundaries generally overlap with a lot of features
  • Any land type features, Ex: cemetry can have overlapping building

area=yes mapping convention

  • highway: pedistrian and area: yes is acceptable mapping on OpenStreetMap

Bad combination

  • When a highway >= residential overlaps with leisure=*

@amishas157
Copy link
Contributor Author

amishas157 commented Apr 19, 2017

Updates

@amishas157
Copy link
Contributor Author

Analysis for overlaps between natural:water and building:yes

Total number of overlaps found: 37

Based on eyeballing these overlaps, can be categorized as following:

  • Case 1: Harmful overlap with a feature being overlapped by multiple features. (Number of such features: 1)

1

  • Case 2: Harmful overlap with a feature overlapped by only 1 other feature. (Number of such features: 1)

screen shot 2017-04-19 at 10 39 38 pm

  • Case 3: Not harmful overlap but good to detect. (Number of such features: 2)

screen shot 2017-04-19 at 10 59 59 pm

screen shot 2017-04-20 at 12 18 38 pm

  • Case 4: Just a mapping issue which can be ignored: (Number of such features: 15)

screen shot 2017-04-19 at 10 36 35 pm

Learnings:

  • Case 1 clearly indicates that if a feature is overlapped by a good number of features , it has a high probability being harmful.
  • Case 2 suggest that if a feature is overlapping to a greater extent (in terms of area) with another feature , it has a good probability for being harmful.
  • Case 3 contains the cases in which both the above factors satisfy but still the feature is not harmful. But I think it's fine to catch these kind of overlaps as well.
  • Case 4 contains the cases which are mostly due to mapping errors and can be ignored as they are not harmful to be present on map + these are a lot much noise. By noise I mean these cases happen very frequently

@lukasmartinelli
Copy link
Contributor

lukasmartinelli commented Apr 21, 2017

I still think it is a valuable addition, especially in light that while Case 1 is not found very often in the map - it is some bad vandalism we've seen before.

Apart from water I think this will helpful to help detecting Pokemon users adding new parks on top of buildings.

@lukasmartinelli
Copy link
Contributor

Also the detailed documentation how you approach this problem is an inspiring example! Thanks for digging into this.

@bkowshik
Copy link
Contributor

Really enjoying how this is moving, awesome work @amishas157 🎉

@manoharuss
Copy link
Contributor

Awesome work @amishas157.

@krishnanammala and I reviewed 32 changesets out of which 3 were found to have been actionable. Hitrate: 9.3%

Observations:

Overlap feedback

  1. Pitch and rock can be excluded out of leisure overlap https://github.com/mapbox/osm-compare/blob/master/comparators/feature_overlap.js#L164
  2. Can we have a threshold on how much % an overlap should be flagged to avoid flagging rough tracing and imagery offset based mapping https://osmcha.mapbox.com/48187204/, https://osmcha.mapbox.com/48184134/
  3. This feature has only overlap with a pedestrian highway, wrong detection? https://osmcha.mapbox.com/48176599/
  4. Some buildings do overlap with parks https://osmcha.mapbox.com/48171466/. I think we have to deal with this kind of noise.
  5. Wrong detections? https://www.openstreetmap.org/way/489576331/history, https://osmcha.mapbox.com/48163148/

@manoharuss
Copy link
Contributor

manoharuss commented May 24, 2017

@amishas157 This changeset was flagged with 3 features for Feature overlap comparator. https://osmcha.mapbox.com/48936480/

  1. 495603088 - 1st feature was a building and was only sharing a boundary with the next building\
  2. The other 2 features that were flagged did not have any overlap with anyother feature at all.

@manoharuss
Copy link
Contributor

manoharuss commented Jun 6, 2017

Posting here for visibility

  • Feature overlap comparator is flagging overlaps that may occur for a target feature one at a time
  • This means, some flagged edits are not really overlapping as the features that may overlap are moved in the same changeset

screen_shot

Will post more notes after a sample review.

@manoharuss
Copy link
Contributor

manoharuss commented Jun 6, 2017

Review Feedback 6th June

I went on by reviewing unchecked changesets by feature overlap comparator in OSMCha and captured notes on the noise observed

  1. Observed more cases as mentioned in the above comment, when the feature overlap detected in the that same changeset.

image

  • Another similar example but more complicated - https://osmcha.mapbox.com/49300139/ - I am pretty sure the overlap came from overlap found between new version and old version of the same feature as the old version had a building tag. Checked on JOSM, there was no other data nearby.
  1. Observed a case where a leisure = park was flagged for feature overlap, it was hard to understand which was the other feature the overlap was with, as the data seemed to be as expected. Changeset: https://osmcha.mapbox.com/49300341/. This changeset is a good example to learn from and remove few values from the list of feature types we are checking for. Example: Remove amenity = toilet when checking for a overlap combinations for leisure park

  2. Feature flagged in this changeset leisure = park has a couple of legit buildings inside it

image

Let us consider this as an exception for leisure = parks vs buildings, but the park originated from an experienced user with 3k changesets. So maybe we should think about adding new user condition to the comparator.

I am have a doubt on max zoom as well - https://github.com/mapbox/osm-compare/blob/master/comparators/feature_overlap.js#L14

  • I have a feeling that at zoom level 16, features are much closer to each other on the tile but not so much on zoom level 19 tile for example. Does this also contribute to wrong flags?

@geohacker
Copy link
Contributor

Thank you @manoharuss! So looks like we have two major problems to address here:

  1. Avoid comparing different version of the same feature.
  2. Take into account features that are modified in the same changeset before comparing for overlap.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants