-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimizes collision checking performance #5497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimizes collision checking performance #5497
Conversation
…timize footprint checks Signed-off-by: Adyansh04 <[email protected]>
Codecov Report❌ Patch coverage is
... and 3 files with indirect coverage changes 🚀 New features to boost your workflow:
|
@Adyansh04 can you run some experiments so we can get some metrics about how much faster this potentially is? You'll want to use a different robot model (you could make up a non-circular footprint) for the global costmap in a larger space like the depot/warehouse TB4 world. Then, run this many times (~1000x) over some random start/goal poses. You can probably use the |
@SteveMacenski I used Results — planner-level (average time, seconds)
A very small improvement overall. Why the improvement is small — counters from inCollisionI added counters to SmacHybrid
SmacLattice
So >99% of calls bail out at the inflation check — that limits any payoff from optimizing the deeper collision logic. // if footprint, then we check for the footprint's points, but first see
// if the robot is even potentially in an inscribed collision
if (center_cost_ < possible_collision_cost_ && possible_collision_cost_ > 0.0f) {
return false;
} Microbenchmarks (direct calls)I wrote a custom micro-benchmark to isolate and measure the performance of the specific logic, setFootprint (multiple calls)
inCollision (multiple calls)
|
which map did you run this on? Can you try running it in the depot world for the Tb4 default bringup? Also, did you set the footprint to be non-circular (just verifying)? The default benchmark world that is the random obstacles leaves alot of free space so that is not necessarily representative for the number of collision checks that would happen due to obstacles in the way / better guided heuristics of a more structured space. The more confined the more checks would need the full footprint collision. Something to keep in mind @Adyansh04 is that the number of calls doesn't necessarily correlate to where the computation time is spent, so that is possibly not a good proxy. If exit early only checks a single point, but the full check checks ~120 (for a 2x1m robot) and possibly has cache misses in the cosmap data structure, it can take disproportionately larger amounts of time than the ~0.15-0.3% of the calls would imply. Maybe clocks would be a better metric (though would slow down the system a bit as calling the clock to measure time isn't 'free', it should at least give you a general sense) to know where is the best to put time and effort into optimizing + the impact of this change. @tonynajjar would you be able to test quickly on your benchmarking rig? |
@SteveMacenski Thanks for the detailed feedback. My counter-based analysis was definitely missing the full picture. Based on your suggestions, here’s my plan for a more thorough analysis:
Does this approach sound reasonable? I can share the depot world results and detailed perf analysis once I have them. |
yup! I would just make sure that you still compile with optimizations. If you use debug for GDB then you'll end up measuring non-optimized versions of functions which may or may not represent the actual problems once compiled with optimizations. That's why I suggested the clocks, but you can also do so other ways as well without messing with the optimizations. Its a bit of a chicken and egg problem to make sure you profile with optimizations to know the actual hot spots and then do with debug to refine and optimize iteratively with more information about the impact. Sometimes, depending on the complexity of the code and how much you plan to touch, clocks inline are a reasonable way to go. It'll slow things down because of the clock, but should at least still be proportionate to the number of calls. If I'm doing blind optimization where I don't know the hotspots or will be modifying vast sections of code, then using GDB and perf is obviously better to know where to look. |
Basic Info
Description of contribution in a few bullet points
Description of documentation updates required from your changes
None
Description of how this change was tested
tb3_simulation_launch.py
with circular and rectangular footprints.Future work that may be required in bullet points
For Maintainers:
backport-*
.