Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating results on KITTI / converting results to KITTI format #60

Open
AndreasLH opened this issue Nov 26, 2024 · 0 comments
Open

Comments

@AndreasLH
Copy link

I am trying to replicate the results from table 3 of the paper, as seen below
image

My primary concern lies in how you convert the results to the KITTI format. Can you elaborate on this or compare to how I do it?

As you can see, when I run the code I am not fully able to get the same results, for my purpose I am using the KITTI validation set but would still expect the results to be very similar. I am using the evaluation code from the MonoDETR repo, the results marked in red are the AP3D@70 and the green is AP3D@50. I am using your pretrained outdoor model, and while I realise that you use a model only trained on KITTI for table 3, i would not expect such a large gap.

As you can see my results are

AP3D@70 APBEV@70
Easy Med Hard Easy Med Hard
14.67 11.73 10.46 23.04 18.17 15.88

image

My code for converting the output to KITTI is as follows:

instances_predictions.pth is the output generated from the model, to generate the output I am using a slight modified version of your demo script

    in_path = 'output/'+input_folder+'/KITTI_pred/instances_predictions.pth'
    data_json = torch.load(in_path) # instances_predictions.pth output from the model
    # 
    files = {}
    for image in tqdm(data_json):
        K = image['K']
        K_inv = np.linalg.inv(K)
        width, height = image['width'], image['height']
        image_id = image['image_id']
        l = []
        for pred in image['instances']:

            category = thing_classes[pred['category_id']]
            if category not in cats:
                continue
            occluded = 0
            truncation = 0.0 # it does not matter
            rotation_y = mat2euler(np.array(pred['pose']))[1]
            bbox = BoxMode.convert(pred['bbox'], BoxMode.XYWH_ABS, BoxMode.XYXY_ABS) # x1, y1, x2, y2 -> convert to left, top, right, bottom
            h3d, w3d, l3d = pred['dimensions']
            # unproject, this should yield the same 
            # cen_2d = np.array(pred['center_2D'] + [1])
            # z3d = pred['center_cam'][2]
            # x3d, y3d, z3d = (K_inv @ (z3d*cen_2d))

            x3d, y3d, z3d = pred['center_cam']

            location = pred['center_cam']
            score = pred['score']
            alpha = calculate_alpha(location, rotation_y)

            # convert to KITTI format
            li = [category, truncation, occluded, alpha, bbox[0], bbox[1], bbox[2], bbox[3], h3d, w3d, l3d, x3d, y3d, z3d, rotation_y, score]
            l.append(li)
        # sort l by z3d (not neccesary)
        l = sorted(l, key=lambda x: x[13])
        files[image_id] = l

    # 7518 test images
    os.makedirs(out_path, exist_ok=True)
    for img_id, content in files.items():

        img_id_str = str(img_id).zfill(6)
        with open(out_path+f'{img_id_str}.txt', 'w') as f:
            str_i = ''
            for i in content:
                # t = f'{category} {truncation:.2f} {occluded} {alpha:.2f} {bbox[0]:.2f} {bbox[1]:.2f} {bbox[2]:.2f} {bbox[3]:.2f} {w3d:.2f} {h3d:.2f} {l3d:.2f} {x3d:.2f} {y3d:.2f} {z3d:.2f} {rotation_y:.2f} {score:.2f}\n'
                t = f'{i[0][0].upper() + i[0][1:]} {i[1]:.2f} {i[2]} {i[3]:.2f} {i[4]:.2f} {i[5]:.2f} {i[6]:.2f} {i[7]:.2f} {i[8]:.2f} {i[9]:.2f} {i[10]:.2f} {i[11]:.2f} {i[12]:.2f} {i[13]:.2f} {i[14]:.2f} {i[15]:.2f}\n'
                str_i += t
            f.write(str_i)

Helper functions

def perp_vector(a, b):
    return np.array([b, -a])  

def rotate_vector(x, y, theta):
    # Calculate the rotated coordinates
    x_rotated = x * np.cos(theta) - y * np.sin(theta)
    y_rotated = x * np.sin(theta) + y * np.cos(theta)
    
    return np.array([x_rotated, y_rotated])

def calculate_alpha(location, ry):
    '''
    location: x, y, z coordinates
    ry: rotation around y-axis, negative counter-clockwise,
    
    positive x-axis is to the right
    calculate the angle from a line perpendicular to the camera to the center of the bounding box'''

    # get vector from camera to object
    ry = -ry
    x, y, z = location
    # vector from [0,0,0] to the center of the bounding box
    # we can do the whole thing in 2D, top down view
    # vector perpendicular to center
    perpendicular = perp_vector(x,z)
    # vector corresponding to ry
    ry_vector = np.array([np.cos(ry), np.sin(ry)])
    # angle between perpendicular and ry_vector
    dot = perpendicular[0]*ry_vector[0] + perpendicular[1]*ry_vector[1]      # Dot product between [x1, y1] and [x2, y2]
    det = perpendicular[0]*ry_vector[1] - perpendicular[1]*ry_vector[0]      # Determinant
    alpha = -np.arctan2(det, dot)

    # wrap to -pi to pi
    if alpha > np.pi:
        alpha -= 2*np.pi
    if alpha < -np.pi:
        alpha += 2*np.pi
    return alpha
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant