Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG FIX] getCorrespondenceScore must not go out-of-bounds. #6070

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fghoussen
Copy link

@fghoussen fghoussen commented Jun 22, 2024

[BUG FIX] getCorrespondenceScore must not go out-of-bounds.

I have a quite systematic crash when using pcl::registration::CorrespondenceRejectorMedianDistance or pcl::registration::CorrespondenceRejectorSurfaceNormal.

I use it like so:

   std::unique_ptr<pcl::IterativeClosestPoint<PointNT, PointNT, double>> icp;
   icp = std::make_unique<pcl::IterativeClosestPoint<PointNT, PointNT, double>>();

   pcl::registration::CorrespondenceRejectorMedianDistance::Ptr rejMedDist;
   rejMedDist.reset (new pcl::registration::CorrespondenceRejectorMedianDistance);
   rejMedDist->setInputTarget<PointNT> (cloud_mst_normal);
   rejMedDist->setInputSource<PointNT> (cloud_initial_slv_normal);
   rejMedDist->setSearchMethodTarget<PointNT> (tree_mst_normal);
   rejMedDist->setMedianFactor (medianDistanceFactor);
   icp->addCorrespondenceRejector (rejMedDist);

If any mistake (explaining crashes ?) is obvious, I'd be glad to know.
In any cases, the attached patch seem to make things better (less often crash) but does not solve the problem (crash occurs on a regular basis).

The patch is basically "each time I access a member, check the index is consistent with the member size, or, return default value (here maximum value to say the score is as bad as possible AFAIU the code)"

I pushed this to let the PCL community decide if this patch is worth being merged or not.

@fghoussen
Copy link
Author

Forgot to mention, at my side, on several different point clouds as inputs, even with this fix, crashes occurs around here: https://github.com/fghoussen/pcl/blob/d257a89130386c3356fba8ba0f04e8e2089704f3/registration/src/correspondence_rejection_median_distance.cpp#L59

Not sure how this is safe when distance are max'ed by bad correspondence score

@mvieth
Copy link
Member

mvieth commented Jun 23, 2024

CorrespondenceRejectorMedianDistance is successfully tested here: https://github.com/PointCloudLibrary/pcl/blob/master/test/registration/test_registration.cpp#L262 source, target, etc are set within ICP (for the source this is strictly necessary since it changes from iteration to iteration).

In any cases, the attached patch seem to make things better (less often crash) but does not solve the problem (crash occurs on a regular basis).

If you say that it does not really solve the problem, I would rather investigate how the crash actually happens, and then create a fix based on that. Can you provide a full program that I can use to easily reproduce the crash? Including source and target cloud, e.g. from the pcl test directory,

@fghoussen
Copy link
Author

fghoussen commented Jun 23, 2024

With my data, following examples you mentioned, I get crash on a regular basis.
This patch (with added commits) seems to make thing a lot-lot more stable (does not crash anymore after 15+ successive try): this is only checking bounds before accessing elements in lists.

Can you provide a full program that I can use to easily reproduce the crash? Including source and target cloud

Unfortunately not that easy: I can't provide data (not mine - need to check if I can share them) and they are big files (not great to commit in repo IMO), the original code can not be compiled in full debug (= difficult to analyse crash as you do not get all symbols)

@mvieth
Copy link
Member

mvieth commented Jun 24, 2024

With my data, following examples you mentioned, I get crash on a regular basis. This patch (with added commits) seems to make thing a lot-lot more stable (does not crash anymore after 15+ successive try): this is only checking bounds before accessing elements in lists.

I don't think it is a good idea to merge a change if the bug is not really understood. This seems to treat the symptoms, but potentially does not solve the bug itself. In the worst case, these checks could hide a more severe bug somewhere else in the code.

Can you provide a full program that I can use to easily reproduce the crash? Including source and target cloud

Unfortunately not that easy: I can't provide data (not mine - need to check if I can share them) and they are big files (not great to commit in repo IMO), the original code can not be compiled in full debug (= difficult to analyse crash as you do not get all symbols)

In case you can't share the data, you could try to use a point cloud from the pcl test directory ( https://github.com/PointCloudLibrary/pcl/tree/master/test ) or from the pcl data repo ( https://github.com/PointCloudLibrary/data ). How many points do the point clouds have that you use? I am asking because if it is a certain number (several millions or even billions), this might point to a specific kind of bug.
Since CorrespondenceRejectorMedianDistance seems to be used successfully in test_registration.cpp, I think the main question is: what is causing the bug for you? Is it the data you are using? The parameters/settings you are using? Something else? I can only help investigating the bug if you help me reproduce it. If you could share a more complete program to reproduce the bug (starting from loading the point clouds from files to calling align, including all parameters you set), that would already be very helpful.

@fghoussen
Copy link
Author

fghoussen commented Jun 24, 2024

I'll see what I can do... Look closely at the patch (hidding whitespaces): this is only bound checking

I don't think it is a good idea to merge a change if the bug is not really understood.

I understand your point of view. But the problem is now understood: after bound checking, the code is stable.
I will likely not be able to go deeper in the analysis: gdb reported index = 26879 and (*input_).size() = 15698 on my data... And I have no idea why?!.. But a least, with this patch, the code doesn't crash anymore

In case you can't share the data

Not sure. Need to check. This will be time consuming too...

what is causing the bug for you? Is it the data you are using?

The problem seems to lie in the data (= point cloud). Reducing point cloud to minimal setting is impossible (I do not know what trigger the problem in the PCL code) and modifying point cloud will be time consuming too (possibly making the problem disappear).

you could share a more complete program to reproduce the bug

Once again, not sure I'll have time for this: I already spent more time than expected. I'll see what I can do...

@fghoussen
Copy link
Author

Adding a unit test with data from https://github.com/PointCloudLibrary/pcl/tree/master/test may or may not reproduce the problem. Could you add one on top of this PR? If not, I'll see if I have time... But I can't guarantee

@fghoussen
Copy link
Author

The problem is maybe silent in the current test suite: how to run the test suite?
cmake -DBUILD_benchmarks=ON -DBUILD_examples=ON -DBUILD_simulation=ON -DBUILD_global_tests=ON ..; make all testdoes not trigger the test suite

@mvieth
Copy link
Member

mvieth commented Jun 24, 2024

The problem is maybe silent in the current test suite: how to run the test suite? cmake -DBUILD_benchmarks=ON -DBUILD_examples=ON -DBUILD_simulation=ON -DBUILD_global_tests=ON ..; make all testdoes not trigger the test suite

cmake --build . -- tests or make tests

@fghoussen
Copy link
Author

The problem is maybe silent in the current test suite

I've just tested: the test suite doesn't see the problem which confirm it seems related to the point clouds used as input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants