-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[similarity] add similarity distance #1176
base: develop
Are you sure you want to change the base?
[similarity] add similarity distance #1176
Conversation
A rephrased description, with a bit of AI help: To compute the Average Similarity Distance (ASD) between two linestrings, let's consider linestring
The resulting ASD value provides an average distance measure that captures the geometric similarity between the linestrings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!! That's a new algorithm right? There is a similar work where they compute the area between two linestrings but in a slightly different way https://link.springer.com/article/10.1007/s12289-018-1421-8 (ends up in a different area).
A did a first parse and put some comments in the code.
Is there cases where other measures (Fréchet or Hausdorff) is better than the new one? How the new method perform when the inputs differ a lot between them?
auto get_projection(std::size_t index, Point const& point, Geometry const& target) | ||
{ | ||
using coor_t = typename coordinate_type<Point>::type; | ||
using closest_strategy = geometry::strategy::closest_points::detail::compute_closest_point_to_segment<coor_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should passed as a function parameter following a similar mechanism as in all other algorithms.
auto const& current = target[i]; | ||
auto const& next = target[i + 1]; | ||
projection<Point> other; | ||
auto const pp = closest_strategy::apply(point, current, next); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very similar to how closest points are computed for a linestring. I guess you are not using that algorithm because you want to keep track also the index. Maybe the closest_points procedure can be enhanced with this extra information (e.g. providing the index) and reused here?
From performance point of view there is some duplication in the computation of the closest point here with the distance computed in two lines below. Also, you may consider using the comparable distance instead of the real distance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it. It seems that some code duplication is also there in the detail::closest_feature::point_to_point_range
function, where the closest segment is determined based on (comparable) distance. Which, internally, already finds the projection - but it is discarded (in the strategy itself).
Then the closest point is calculated.
Here I do it the other way round, calling first closest point strategy (which already does some calculations to find it - but does not calculate the distance yet) and then calculate its distance (I could use comparable instead - changed it right now).
I'm not sure if it is worth the effort to harmonize this code for this reason only, it will make it more complex.
What could be useful is a strategy which gives all the internal information to avoid code duplication in other places. Something like that is already in extensions (calculate_distance_info
) but never made it. So maybe also that is not really worth the effort...
auto const sum_area = get_areal_sum(boost::begin(p_enriched), boost::end(p_enriched), | ||
boost::begin(q_enriched), boost::end(q_enriched), ring, visitor); | ||
|
||
auto const length = (std::min)(geometry::length(p), geometry::length(q)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least for the reversed linestring the length is already computed (maybe this can be reused here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that, unless you mean the distance calculation done for reversal.
int const side_p0_q = side_strategy::apply(q0.point, q1.point, p0.point); | ||
int const side_p1_q = side_strategy::apply(q0.point, q1.point, p1.point); | ||
|
||
if (side_q0_p == 0 && side_q1_p == 0 && side_p0_q == 0 && side_p1_q == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't side_q0_p == 0 && side_q1_p == 0
be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should indeed be enough, at least for the cases I tested. However, I'm not sure about FP-precision where there are sometimes weird results (such as 3 times 0 and one non-zero). And the sides have to be calculated later (used later) anyway.
But with those FP-precision errors the area is probably minimal - so it probably won't make a real difference.
I will think about what my final preference is here.
} | ||
|
||
template <typename Projection, typename Ring> | ||
bool fill_quadrilateral(Projection const& p0, Projection const& p1, Projection const& q0, Projection const& q1, Ring& ring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always a quadrileteral? In some cases could be a triangle or even degenerated to a segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it can. But sorting that out is probably not worth it, because the area will be the same.
Thanks, that's interesting! AFAIU the CS-specific operations are closest points, side and area so technically this distance should work in all coordinate systems correct? Do I understand correctly that you wanted to show a prototype to start a discussion and you plan to implement it with Boost.Geometry structure (strategy, dispatch, etc.) if we agree that this is a good addition to the library? Is ASD an official name for this measure? |
#include <algorithm> | ||
#include <iterator> | ||
#include <utility> | ||
#include <limits> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall that this order of headers is prefered in general. In our case this would be Boost.Geometry headers, then other Boost headers, then standard headers. Do we want to make a guideline about this?
I'd add one thing. The headers are not sorted alphabetically. They probably should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that in other file the order is different so I'm curious why were they moved here?
#include <cstdint> | ||
#include <vector> | ||
|
||
#include <boost/geometry/core/point_type.hpp> | ||
#include <boost/geometry/algorithms/distance.hpp> | ||
#include <boost/geometry/algorithms/length.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here standard headers are before Boost.Geometry headers though.
Thank you both for your comments! Cool! |
To clarify, I'm considering whether or not giving it such general name is a good idea. There are various measures of similarity after all. And AFAIU they are not interchangeable. Unless it would be possible to define some kind of standard of similarity allowing to generate comparable results between various algorithms? So the question is which one if any should be considered as the default one in the library? If it's not possible to answer this question then maybe this algorithm should have a specific name like the other ones. |
Yes, correct, I added a call for geographic coordinate system as well.
Yes, indeed
Okay, I now have |
Fixes: issue boostorg#1186
Would it be possible to calculate the similarity of areal geometries (rings or polygons) this way, i.e. pass them into this algorithm and internally calculate the similarity of their boundaries? If it is then this name would not work for them because the average distance of two intersecting polygons would always be 0. |
This is a new similarity distance, comparable to Hausdorff or Frechet.
It calculates the area between two lines and divides them by the line length. This is an average distance, measuring quite well how similar two lines are. It is less sensible to outliers (such as Hausdorff/Frechet) and not discrete.
For example this case (west1/west2):
They are quite similar. Distance impressions are:
Frechet: 53.4426 Hausdorff: 53.4426 Average: 5.61308
How it works in detail:
(See the comment below for an enhanced description)
Zoomed in:
Apart from this, this PR also adds an experimental
geojson_writer
If you like it, I'll also add a version for rings. For polygons/multi it is a bit more complex (especially if the number of objects are unequal), to be worked out.
The
geojson_writer
is really useful for visualizing results (or debug steps). But it is not completely finished either.Complete output of the test program:
Showing that:
I also tried to give a "partial" result, which happens if one line is partly identical to the other line. This is possible, but not yet finished. It uses the projections and then recursively finds the best division point. But this is not part of the PR.