We have designed a model-based scooping method via motion control with a minimalist hardware design: a two-fingered parallel-jaw gripper with a fixed-length finger and a variable-length thumb. When being executed in a bin scenario, instance segmentation using Mask R-CNN and pose estimation using Open3D 0.7.0.0 are needed. Also, the model analyzes one object on a flat surface, and cannot reflect complex interactions in a 3-D environment. For a heterogeneous cluster of unseen objects, it is difficult to apply the previous model-based method. Thus, we design a supervised hierarchical learning framework to predict the parameters of the scooping action directly from the RGB-D image of the bin scenario. Here are some video clips of the experiments.
There are five parameters to be predicted: the finger position 𝑝, the horizontal distance between two fingers 𝑑, the ZYX Euler angle representation of the gripper orientation: yaw 𝛼, pitch 𝛽, and roll 𝛾. We design a hierarchical three-tier learning method. The input of the framework is the RGB-D image of the bin scenario. Tier 1 outputs the prediction of finger position 𝑝, and yaw 𝛼. Tier 2 predicts the distance 𝑑. Tier 3 predicts another two parameters: 𝛽 and 𝛾. See the following figure:
- Universal Robot UR10
- Robotiq 140mm Adaptive parallel-jaw gripper
- RealSense Camera L515
- Customized Gripper design comprises a variable-length thumb and a dual-material finger, for realizing finger length difference during scooping and dual material fingertip for the combination of dig-grasping and scooping.
This implementation requires the following dependencies (tested on Ubuntu 16.04 LTS):
- ROS Kinetic
- Urx for UR10 robot control
- robotiq_2finger_grippers: ROS driver for Robotiq Adaptive Grippers
- pySerial for accessing arduino through serial connection and control the extendable finger.
- PyBullet for collision check
- PyTorch for constructing and training the network
- pyrealsense2: A python wrapper for realsense camera.
We use a Realsense L515 camera to get the RGB image and the depth image. Then, we combine the RGB image and the depth image to make the RGB-D heightmap. A heightmap is an RGB-D image obtained from a 3D point cloud, describing the 3D information of the bin scenario. Each pixel in the heightmap is in linear relation to its horizontal position in the world frame and corresponds to a value indicating the height-from-bottom information.
python utils/heightmap.py
Here is the set of heightmaps describing the cluster of Go stones, domino blockes, Acrylic borads: image set
I also write an annotating software to label the data.
learned_scooping/annotating_software/label_Tier1.py
is for Tier 1, where the pixel where should (not) be the target finger position should be labeled green (red). You can choose the shape and size of the brush.learned_scooping/annotating_software/label_Tier2.py
is for Tier 2. We need to label the target thumb position given the target finger position.
- Train Tier 1:
learned_scooping/training_program/training_tier1.ipynb
- Train Tier 2:
learned_scooping/training_program/training_tier2.ipynb
- Train Tier 3:
learned_scooping/training_program/training_tier3.ipynb
Note: The online compiler Jupyter Notebook is needed to run our program.
- Network parameters for Tier 1: netparam_Tier1.pkl
- Network parameters for Tier 2: netparam_Tier2.pkl
- Network parameters for Tier 3: netparam_Tier3.pkl
Please run the following program: learned_scooping/test_on_real_robot.py
Before running the program, please first download the network parameters and save them to a proper position. Then, change the corresponding address in the code (line 40, line 42, and line 44).
This can be successfully applied to the cluster of Go stones, domino blockes, Acrylic borads and key-shape 3D-printed models.
For any technical issues, please contact: Tierui He ([email protected]).