Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmozing the API of the usual and gpu octree to faciliate swapping implementations easily #4818

Open
FabianSchuetze opened this issue Jun 27, 2021 · 6 comments
Labels
kind: request Type of issue status: triage Labels incomplete

Comments

@FabianSchuetze
Copy link
Contributor

When working with the GPU and CPU versions of the octree class, I had to make several modifications to my codebase before running the same data on both host and device. In this ticket, I would first like to document the necessary changes based on the (abridged) octree tutorial. Furthermore, I thought of a few ways to harmonize APIs to facilitate swapping different implementations easily.

The (abridged) tutorial I run was:

#include <pcl/octree/octree_search.h>
#include <pcl/point_cloud.h>

#include <ctime>
#include <iostream>
#include <vector>

int main() {
    srand((unsigned int)time(NULL));

    pcl::PointCloud<pcl::PointXYZ>::Ptr cloud(
        new pcl::PointCloud<pcl::PointXYZ>);

    // Generate pointcloud data
    cloud->width = 1000;
    cloud->height = 1;
    cloud->points.resize(cloud->width * cloud->height);

    for (std::size_t i = 0; i < cloud->size(); ++i) {
        (*cloud)[i].x = 1024.0f * rand() / (RAND_MAX + 1.0f);
        (*cloud)[i].y = 1024.0f * rand() / (RAND_MAX + 1.0f);
        (*cloud)[i].z = 1024.0f * rand() / (RAND_MAX + 1.0f);
    }

    float resolution = 128.0f;

    pcl::octree::OctreePointCloudSearch<pcl::PointXYZ> octree(resolution);

    octree.setInputCloud(cloud);
    octree.addPointsFromInputCloud();

    pcl::PointXYZ searchPoint;

    searchPoint.x = 1024.0f * rand() / (RAND_MAX + 1.0f);
    searchPoint.y = 1024.0f * rand() / (RAND_MAX + 1.0f);
    searchPoint.z = 1024.0f * rand() / (RAND_MAX + 1.0f);

    std::vector<int> pointIdxRadiusSearch;
    std::vector<float> pointRadiusSquaredDistance;

    float radius = 256.0f * rand() / (RAND_MAX + 1.0f);

    std::cout << "Neighbors within radius search at (" << searchPoint.x << " "
              << searchPoint.y << " " << searchPoint.z
              << ") with radius=" << radius << std::endl;

    if (octree.radiusSearch(searchPoint, radius, pointIdxRadiusSearch,
                            pointRadiusSquaredDistance) > 0) {
        for (std::size_t i = 0; i < pointIdxRadiusSearch.size(); ++i)
            std::cout << "    " << (*cloud)[pointIdxRadiusSearch[i]].x << " "
                      << (*cloud)[pointIdxRadiusSearch[i]].y << " "
                      << (*cloud)[pointIdxRadiusSearch[i]].z
                      << " (squared distance: " << pointRadiusSquaredDistance[i]
                      << ")" << std::endl;
    }
}

The GPU version including the necessary revisions is:

#include <pcl/octree/octree_search.h>
#include <pcl/point_cloud.h>
#include <pcl/gpu/octree/octree.hpp>

#include <ctime>
#include <iostream>
#include <vector>

int main() {
    srand((unsigned int)time(NULL));

    //pcl::PointCloud<pcl::PointXYZ>::Ptr cloud(
        //new pcl::PointCloud<pcl::PointXYZ>);

    pcl::gpu::DeviceArray<pcl::PointXYZ> cloud(1000);
    std::vector<pcl::PointXYZ> tmp(1000);

    // Generate pointcloud data
    //cloud->width = 1000;
    //cloud->height = 1;
    //cloud->points.resize(cloud->width * cloud->height);

    for (std::size_t i = 0; i < cloud.size(); ++i) {
        (tmp)[i].x = 1024.0f * rand() / (RAND_MAX + 1.0f);
        (tmp)[i].y = 1024.0f * rand() / (RAND_MAX + 1.0f);
        (tmp)[i].z = 1024.0f * rand() / (RAND_MAX + 1.0f);
    }
    cloud.upload(&tmp[0], 1000);

    //float resolution = 128.0f;
    //pcl::octree::OctreePointCloudSearch<pcl::PointXYZ> octree(resolution);
    pcl::gpu::Octree octree{};

    //octree.setInputCloud(cloud);
    octree.setCloud(cloud);
    //octree.addPointsFromInputCloud();
    octree.build();

    pcl::PointXYZ searchPoint;
    pcl::gpu::DeviceArray<pcl::PointXYZ> gpuSearchPointVec(1);

    searchPoint.x = 1024.0f * rand() / (RAND_MAX + 1.0f);
    searchPoint.y = 1024.0f * rand() / (RAND_MAX + 1.0f);
    searchPoint.z = 1024.0f * rand() / (RAND_MAX + 1.0f);
    gpuSearchPointVec.upload(&searchPoint, 1);

    // Neighbors within radius search

    //std::vector<int> pointIdxRadiusSearch;
    //std::vector<float> pointRadiusSquaredDistance;
    int max_answers = 1000;
    pcl::gpu::NeighborIndices results(gpuSearchPointVec.size(), max_answers);

    float radius = 256.0f * rand() / (RAND_MAX + 1.0f);

    std::cout << "Neighbors within radius search at (" << searchPoint.x << " "
              << searchPoint.y << " " << searchPoint.z
              << ") with radius=" << radius << std::endl;

    octree.radiusSearch(gpuSearchPointVec, radius, max_answers,
                            results);
}

Modification for GPU Octree
I could think of modifying the gpu::Octree module as follows:

  • Add the member functions setInputCloud and addPointsFromInputCloud to the GPU Octree module
  • Add another overload to the radiusSearch function returning the sequaredDistances too. A member function such as:
    void radiusSearch(const Queries& centers, float radius, int max_results, NeighborIndices& result, ResultSqrDists& sqr_distances) const might do that.
  • Do we need the max_results argument? When I used Octree at first, I found it difficult to specify the number, and I am still unsure why we need it.
  • Should we provide another overload that accepts one search point as a query instead of requiring a DeviceArray?

Modification for DeviceArray

  • For the DeviceArray, I would like to add a subscript operator to load a host point to the device. At the moment, we have to load points into a std::vector and upload these points afterward.

What do others think about the example and the modifications?

@FabianSchuetze FabianSchuetze added kind: request Type of issue status: triage Labels incomplete labels Jun 27, 2021
@kunaltyagi
Copy link
Member

  • Add the member functions setInputCloud to the GPU Octree module
  • add a subscript operator to load a host point to DeviceArray

Ideally, for speed, batch operations are preferred. In these cases, we need to figure out how to combine the single-operations into a batch op.

Do we need the max_results argument?
Add another overload to the radiusSearch function returning the sequaredDistances
Add the member functions addPointsFromInputCloud to the GPU Octree module

These sound like good ideas :)

@FabianSchuetze
Copy link
Contributor Author

Thanks for the great comments, Kunal! I will work on this and look forward to doing it :-)

@larshg
Copy link
Contributor

larshg commented Jun 28, 2021

Please have a look at #4338 from @haritha-j. I think those where close to be merged, but its some time since I last looked at it 😄

@FabianSchuetze
Copy link
Contributor Author

Thank you so much, Lars, for the comment. The work looks already great! @haritha-j it seems that you have done a lot of fantastic work already, and you are close to the finish line. Do you want to take another look at it?

@haritha-j
Copy link
Contributor

haritha-j commented Jun 30, 2021

Actually yes, I would quite like to finish this off, I should've done it earlier. I should have some free time after the next couple weeks to finish it off, it would be very nice to have the ability to get square distances from this.

Do we need the max_results argument?

This might require a fair bit of changes IMO because I believe it's used to allocate memory for storing results at present.

@FabianSchuetze
Copy link
Contributor Author

Fantastic, @haritha-j ! Thank you for finishing this! I am curious to see how it ends up.

Thank you also for your evaluation of the max_results parameter. I will also take a look to form a better opinion about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: request Type of issue status: triage Labels incomplete
Projects
None yet
Development

No branches or pull requests

4 participants