Skip to content

Setup for Custom Robot and Objects

Venkatraman edited this page Jul 18, 2017 · 2 revisions

DPL provides an actionlib server for interfacing with your ROS-based system. The action message is defined in object_recognition_node/action/DoPerch.action. The input or "goal" is an array of object_ids to be localized, and the output or "result" is an array of corresponding object poses (transforms from the object models to the scene). If a solution could not be found for some reason, the actionlib server sets the GoalStatus to ABORTED.

There are three steps to set up DPL for your robot and objects:

  1. Prepare object 3D models and a meta-data config file,
  2. Determine your camera intrinsics (focal length, principal point) and add them to a config file,
  3. Specify the point cloud topic, reference frame and other environment parameters in a launch file.

Object Models and Meta-Data

You will need 3D models of the objects that you want to identify and localize; these should be in PLY or STL formats, and can be in either millimeters or meters (but uniformly so for all objects). Additionally, make sure your models have a reasonable resolution (not too high-res), as that could significantly increase run time. For instance, we have used models with 1000 - 4000 vertices and 3000 - 6000 faces.

Next, create an XML file similar to sbpl_perception/config/ycb_objects.xml, and add information about your objects. The mesh_in_mm parameter should be set to false if your models are in meters, and true otherwise. Then for each object, create an entry with the following 7 properties.

# Unique object ID
object_name: string

# Absolute path to object model 
model_file: string

# Whether 3D model is flipped about xy-plane
is_object_flipped: bool

# Whether object is symmetric about z-axis (unused)
is_object_rotationally_symmetric: bool

# Symmetry mode
# 0 - no symmetry about z-axis
# 1 - 180 deg symmetry about z-axis
# 2 - completely symmetric about z-axis 
symmetry_mode: int

# A translation discretization for searching (unused by default)
discretization_resolution: double (meters)

# Number of model variants (unused, set to 1)
num_models: int

That's it -- this is all the training needed for recognizing your objects!

Camera Parameters

At this time, the codebase works only with a depth sensor (e.g., MS Kinect, ASUS Primesense etc.), and support for LIDAR-type sensors might be added in the future. Since the DPL algorithms rely on the ability to render scenes, you'll need to provide the necessary camera parameters. To do so, create a file similar to sbpl_perception/config/camera_config.yaml and specify the following parameters:

camera_width: value_double
camera_height: value_double
camera_fx: value_double_millimeters
camera_fy: value_double_millimeters
camera_cx: value_double
camera_cy: value_double
camera_znear: value_double_meters
camera_zfar: value_double_meters

These are the image dimensions, focal length, principal point, and near and far clipping distances. If you don't know the last two, just set them to something reasonable, example 0.1 and 20.0 respectively.

Launch File

The final step to inform the actionlib server of your sensor topic, world frame, and environment bounds. Copy or edit object_recognition_node/launch/pr2_object_recognition.launch and specify the following parameters:

input_cloud: your_organized_point_cloud_topic
reference_frame: world_frame (Note: world_frame's z-axis should be aligned with negative gravity)
camera_frame: frame_of_camera_generating_input_cloud (body frame convention: x forward, y left, z up)
table_height: the z-distance (in reference_frame) representing a flat surface on which the objects are located
xmin, xmax, ymin, ymax: the bounds (in reference_frame) within which all objects are located (the tighter the bounds, the smaller the run time usually)

Note: if you have an RGB-D sensor, its likely that you want a "depth-registered" sensor topic, meaning the input_cloud is in the frame of the RGB camera. If so, make sure that the camera_frame is set to the RGB camera body frame, and that the camera parameters (in the previous step) are the RGB camera parameters. Also, it's assumed that the TF tree has a transform from camera_frame to reference_frame.

To bring up the actionlib server, run

roslaunch object_recognition_node <pr2_object_recognition.launch/your_custom_file.launch>

To test if everything works, bring up RVIZ and add a visualization marker topic named perch_marker (as well as your sensor point cloud topic), spin up an actionlib client and send a request. An example python script can be found in object_recognition_node/scripts/pr2_perch_client.py. Run it like:

rosrun object_recognition_node pr2_perch_client <object_name_1> <object_name_2> ... <object_name_n>

If all's good, you should see the 3D models of the objects superimposed on the sensor point cloud!

TODO: add debugging tips