Replicating results on KITTI / converting results to KITTI format #60

AndreasLH · 2024-11-26T12:52:35Z

I am trying to replicate the results from table 3 of the paper, as seen below

My primary concern lies in how you convert the results to the KITTI format. Can you elaborate on this or compare to how I do it?

As you can see, when I run the code I am not fully able to get the same results, for my purpose I am using the KITTI validation set but would still expect the results to be very similar. I am using the evaluation code from the MonoDETR repo, the results marked in red are the AP3D@70 and the green is AP3D@50. I am using your pretrained outdoor model, and while I realise that you use a model only trained on KITTI for table 3, i would not expect such a large gap.

As you can see my results are

AP3D@70			APBEV@70
Easy	Med	Hard	Easy	Med	Hard
14.67	11.73	10.46	23.04	18.17	15.88

My code for converting the output to KITTI is as follows:

instances_predictions.pth is the output generated from the model, to generate the output I am using a slight modified version of your demo script

    in_path = 'output/'+input_folder+'/KITTI_pred/instances_predictions.pth'
    data_json = torch.load(in_path) # instances_predictions.pth output from the model
    # 
    files = {}
    for image in tqdm(data_json):
        K = image['K']
        K_inv = np.linalg.inv(K)
        width, height = image['width'], image['height']
        image_id = image['image_id']
        l = []
        for pred in image['instances']:

            category = thing_classes[pred['category_id']]
            if category not in cats:
                continue
            occluded = 0
            truncation = 0.0 # it does not matter
            rotation_y = mat2euler(np.array(pred['pose']))[1]
            bbox = BoxMode.convert(pred['bbox'], BoxMode.XYWH_ABS, BoxMode.XYXY_ABS) # x1, y1, x2, y2 -> convert to left, top, right, bottom
            h3d, w3d, l3d = pred['dimensions']
            # unproject, this should yield the same 
            # cen_2d = np.array(pred['center_2D'] + [1])
            # z3d = pred['center_cam'][2]
            # x3d, y3d, z3d = (K_inv @ (z3d*cen_2d))

            x3d, y3d, z3d = pred['center_cam']

            location = pred['center_cam']
            score = pred['score']
            alpha = calculate_alpha(location, rotation_y)

            # convert to KITTI format
            li = [category, truncation, occluded, alpha, bbox[0], bbox[1], bbox[2], bbox[3], h3d, w3d, l3d, x3d, y3d, z3d, rotation_y, score]
            l.append(li)
        # sort l by z3d (not neccesary)
        l = sorted(l, key=lambda x: x[13])
        files[image_id] = l

    # 7518 test images
    os.makedirs(out_path, exist_ok=True)
    for img_id, content in files.items():

        img_id_str = str(img_id).zfill(6)
        with open(out_path+f'{img_id_str}.txt', 'w') as f:
            str_i = ''
            for i in content:
                # t = f'{category} {truncation:.2f} {occluded} {alpha:.2f} {bbox[0]:.2f} {bbox[1]:.2f} {bbox[2]:.2f} {bbox[3]:.2f} {w3d:.2f} {h3d:.2f} {l3d:.2f} {x3d:.2f} {y3d:.2f} {z3d:.2f} {rotation_y:.2f} {score:.2f}\n'
                t = f'{i[0][0].upper() + i[0][1:]} {i[1]:.2f} {i[2]} {i[3]:.2f} {i[4]:.2f} {i[5]:.2f} {i[6]:.2f} {i[7]:.2f} {i[8]:.2f} {i[9]:.2f} {i[10]:.2f} {i[11]:.2f} {i[12]:.2f} {i[13]:.2f} {i[14]:.2f} {i[15]:.2f}\n'
                str_i += t
            f.write(str_i)

Helper functions

def perp_vector(a, b):
    return np.array([b, -a])  

def rotate_vector(x, y, theta):
    # Calculate the rotated coordinates
    x_rotated = x * np.cos(theta) - y * np.sin(theta)
    y_rotated = x * np.sin(theta) + y * np.cos(theta)
    
    return np.array([x_rotated, y_rotated])

def calculate_alpha(location, ry):
    '''
    location: x, y, z coordinates
    ry: rotation around y-axis, negative counter-clockwise,
    
    positive x-axis is to the right
    calculate the angle from a line perpendicular to the camera to the center of the bounding box'''

    # get vector from camera to object
    ry = -ry
    x, y, z = location
    # vector from [0,0,0] to the center of the bounding box
    # we can do the whole thing in 2D, top down view
    # vector perpendicular to center
    perpendicular = perp_vector(x,z)
    # vector corresponding to ry
    ry_vector = np.array([np.cos(ry), np.sin(ry)])
    # angle between perpendicular and ry_vector
    dot = perpendicular[0]*ry_vector[0] + perpendicular[1]*ry_vector[1]      # Dot product between [x1, y1] and [x2, y2]
    det = perpendicular[0]*ry_vector[1] - perpendicular[1]*ry_vector[0]      # Determinant
    alpha = -np.arctan2(det, dot)

    # wrap to -pi to pi
    if alpha > np.pi:
        alpha -= 2*np.pi
    if alpha < -np.pi:
        alpha += 2*np.pi
    return alpha

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicating results on KITTI / converting results to KITTI format #60

Replicating results on KITTI / converting results to KITTI format #60

AndreasLH commented Nov 26, 2024

Replicating results on KITTI / converting results to KITTI format #60

Replicating results on KITTI / converting results to KITTI format #60

Comments

AndreasLH commented Nov 26, 2024