Skip to content

Conversation

Adyansh04
Copy link

@Adyansh04 Adyansh04 commented Aug 29, 2025


Basic Info

Info Please fill out this column
Ticket(s) this addresses #5442
Primary OS tested on Ubuntu
Robotic platform tested on Gazebo turtlebot3
Does this PR contain AI generated software? No
Was this PR description generated by AI software? No

Description of contribution in a few bullet points

  • Optimized collision checking performance: Replaced runtime line iterator computations with precomputed cell offsets for each orientation bin
  • Improved memory access patterns: Sequential array access instead of scattered Bresenham algorithm computations

Description of documentation updates required from your changes

None

Description of how this change was tested

  • Tested in gazebo using tb3_simulation_launch.py with circular and rectangular footprints.
  • Performed linting validation using colcon test

Future work that may be required in bullet points

  • Performance benchmarking

For Maintainers:

  • Check that any new parameters added are updated in docs.nav2.org
  • Check that any significant change is added to the migration guide
  • Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists
  • Should this be backported to current distributions? If so, tag with backport-*.

@Adyansh04 Adyansh04 changed the title modify GridCollisionChecker to use precomputed collision cells and op… Optimizes collision checking performance Aug 29, 2025
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nav2_smac_planner/src/collision_checker.cpp 97.22% 1 Missing ⚠️
Files with missing lines Coverage Δ
nav2_smac_planner/src/collision_checker.cpp 98.64% <97.22%> (-1.36%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SteveMacenski
Copy link
Member

@Adyansh04 can you run some experiments so we can get some metrics about how much faster this potentially is? You'll want to use a different robot model (you could make up a non-circular footprint) for the global costmap in a larger space like the depot/warehouse TB4 world. Then, run this many times (~1000x) over some random start/goal poses. You can probably use the tools/planner_benchmarking scripts to do / aid you in this. Give it a map + footprint and let it run. Then repeat for your PR branch to see how much faster it is. The other metrics like path length, average/max cost should be identical

@Adyansh04
Copy link
Author

@SteveMacenski I used tools/planner_benchmarking as you suggested. After an initial warm-up I ran each variant 3 times and averaged the results.

Results — planner-level (average time, seconds)

Function Old (s) New (s)
SmacHybrid 0.04199 0.04145
SmacLattice 0.05413 0.05282

A very small improvement overall.

Why the improvement is small — counters from inCollision

I added counters to inCollision. Most calls exit on the first inflation check, so the expensive path is rarely reached.

SmacHybrid

Metric Count %
Total inCollision calls 11,766,111
Early exit (inflation) 11,749,724 99.86%
Footprint no collision 10,360 0.09%
Footprint collision 5,099 0.04%
Lethal center 928 0.01%

SmacLattice

Metric Count %
Total inCollision calls 45,378,398
Early exit (inflation) 45,218,856 99.65%
Footprint no collision 90,469 0.20%
Footprint collision 29,734 0.07%
Lethal center 39,339 0.09%

So >99% of calls bail out at the inflation check — that limits any payoff from optimizing the deeper collision logic.

// if footprint, then we check for the footprint's points, but first see
    // if the robot is even potentially in an inscribed collision
    if (center_cost_ < possible_collision_cost_ && possible_collision_cost_ > 0.0f) {
      return false;
    }

Microbenchmarks (direct calls)

I wrote a custom micro-benchmark to isolate and measure the performance of the specific logic, inCollision and setFootprint.

setFootprint (multiple calls)

Version Avg Time (ms)
Old 0.002
New 0.063

inCollision (multiple calls)

Version Avg Time (ms)
Old 4.641
New 2.369

@SteveMacenski
Copy link
Member

SteveMacenski commented Sep 11, 2025

which map did you run this on? Can you try running it in the depot world for the Tb4 default bringup? Also, did you set the footprint to be non-circular (just verifying)? The default benchmark world that is the random obstacles leaves alot of free space so that is not necessarily representative for the number of collision checks that would happen due to obstacles in the way / better guided heuristics of a more structured space. The more confined the more checks would need the full footprint collision.

Something to keep in mind @Adyansh04 is that the number of calls doesn't necessarily correlate to where the computation time is spent, so that is possibly not a good proxy. If exit early only checks a single point, but the full check checks ~120 (for a 2x1m robot) and possibly has cache misses in the cosmap data structure, it can take disproportionately larger amounts of time than the ~0.15-0.3% of the calls would imply. Maybe clocks would be a better metric (though would slow down the system a bit as calling the clock to measure time isn't 'free', it should at least give you a general sense) to know where is the best to put time and effort into optimizing + the impact of this change.

@tonynajjar would you be able to test quickly on your benchmarking rig?

@Adyansh04
Copy link
Author

@SteveMacenski Thanks for the detailed feedback. My counter-based analysis was definitely missing the full picture.

Based on your suggestions, here’s my plan for a more thorough analysis:

  • Change Test Environment: Earlier I was using the random obstacle default map with a rectangular footprint of the size of turtlebot3 which has lots of free space. I'll switch from the default random world to the depot world from the TB4 bringup. This should create more constrained scenarios and trigger the full footprint check more frequently. I will also use a bigger rectangular footprint

  • Profile CPU Cycles with perf: Instead of relying on call counters, I'll use perf record on the planner server for each SMAC planner variant. This will show the actual CPU cycles used in the different code paths and tell us what percentage of total planning time it represents. This should give us the ground truth on the time cost, just as you suggested.

  • Analyze Cache Performance: I'll also run a perf session to specifically measure cache-references and cache-misses. This should quantify the cache locality behaviour.

Does this approach sound reasonable? I can share the depot world results and detailed perf analysis once I have them.

@SteveMacenski
Copy link
Member

SteveMacenski commented Sep 15, 2025

yup! I would just make sure that you still compile with optimizations. If you use debug for GDB then you'll end up measuring non-optimized versions of functions which may or may not represent the actual problems once compiled with optimizations. That's why I suggested the clocks, but you can also do so other ways as well without messing with the optimizations. Its a bit of a chicken and egg problem to make sure you profile with optimizations to know the actual hot spots and then do with debug to refine and optimize iteratively with more information about the impact.

Sometimes, depending on the complexity of the code and how much you plan to touch, clocks inline are a reasonable way to go. It'll slow things down because of the clock, but should at least still be proportionate to the number of calls. If I'm doing blind optimization where I don't know the hotspots or will be modifying vast sections of code, then using GDB and perf is obviously better to know where to look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants