You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to replicate the results from table 3 of the paper, as seen below
My primary concern lies in how you convert the results to the KITTI format. Can you elaborate on this or compare to how I do it?
As you can see, when I run the code I am not fully able to get the same results, for my purpose I am using the KITTI validation set but would still expect the results to be very similar. I am using the evaluation code from the MonoDETR repo, the results marked in red are the AP3D@70 and the green is AP3D@50. I am using your pretrained outdoor model, and while I realise that you use a model only trained on KITTI for table 3, i would not expect such a large gap.
As you can see my results are
AP3D@70
APBEV@70
Easy
Med
Hard
Easy
Med
Hard
14.67
11.73
10.46
23.04
18.17
15.88
My code for converting the output to KITTI is as follows:
instances_predictions.pth is the output generated from the model, to generate the output I am using a slight modified version of your demo script
in_path='output/'+input_folder+'/KITTI_pred/instances_predictions.pth'data_json=torch.load(in_path) # instances_predictions.pth output from the model# files= {}
forimageintqdm(data_json):
K=image['K']
K_inv=np.linalg.inv(K)
width, height=image['width'], image['height']
image_id=image['image_id']
l= []
forpredinimage['instances']:
category=thing_classes[pred['category_id']]
ifcategorynotincats:
continueoccluded=0truncation=0.0# it does not matterrotation_y=mat2euler(np.array(pred['pose']))[1]
bbox=BoxMode.convert(pred['bbox'], BoxMode.XYWH_ABS, BoxMode.XYXY_ABS) # x1, y1, x2, y2 -> convert to left, top, right, bottomh3d, w3d, l3d=pred['dimensions']
# unproject, this should yield the same # cen_2d = np.array(pred['center_2D'] + [1])# z3d = pred['center_cam'][2]# x3d, y3d, z3d = (K_inv @ (z3d*cen_2d))x3d, y3d, z3d=pred['center_cam']
location=pred['center_cam']
score=pred['score']
alpha=calculate_alpha(location, rotation_y)
# convert to KITTI formatli= [category, truncation, occluded, alpha, bbox[0], bbox[1], bbox[2], bbox[3], h3d, w3d, l3d, x3d, y3d, z3d, rotation_y, score]
l.append(li)
# sort l by z3d (not neccesary)l=sorted(l, key=lambdax: x[13])
files[image_id] =l# 7518 test imagesos.makedirs(out_path, exist_ok=True)
forimg_id, contentinfiles.items():
img_id_str=str(img_id).zfill(6)
withopen(out_path+f'{img_id_str}.txt', 'w') asf:
str_i=''foriincontent:
# t = f'{category} {truncation:.2f} {occluded} {alpha:.2f} {bbox[0]:.2f} {bbox[1]:.2f} {bbox[2]:.2f} {bbox[3]:.2f} {w3d:.2f} {h3d:.2f} {l3d:.2f} {x3d:.2f} {y3d:.2f} {z3d:.2f} {rotation_y:.2f} {score:.2f}\n't=f'{i[0][0].upper() +i[0][1:]}{i[1]:.2f}{i[2]}{i[3]:.2f}{i[4]:.2f}{i[5]:.2f}{i[6]:.2f}{i[7]:.2f}{i[8]:.2f}{i[9]:.2f}{i[10]:.2f}{i[11]:.2f}{i[12]:.2f}{i[13]:.2f}{i[14]:.2f}{i[15]:.2f}\n'str_i+=tf.write(str_i)
Helper functions
defperp_vector(a, b):
returnnp.array([b, -a])
defrotate_vector(x, y, theta):
# Calculate the rotated coordinatesx_rotated=x*np.cos(theta) -y*np.sin(theta)
y_rotated=x*np.sin(theta) +y*np.cos(theta)
returnnp.array([x_rotated, y_rotated])
defcalculate_alpha(location, ry):
''' location: x, y, z coordinates ry: rotation around y-axis, negative counter-clockwise, positive x-axis is to the right calculate the angle from a line perpendicular to the camera to the center of the bounding box'''# get vector from camera to objectry=-ryx, y, z=location# vector from [0,0,0] to the center of the bounding box# we can do the whole thing in 2D, top down view# vector perpendicular to centerperpendicular=perp_vector(x,z)
# vector corresponding to ryry_vector=np.array([np.cos(ry), np.sin(ry)])
# angle between perpendicular and ry_vectordot=perpendicular[0]*ry_vector[0] +perpendicular[1]*ry_vector[1] # Dot product between [x1, y1] and [x2, y2]det=perpendicular[0]*ry_vector[1] -perpendicular[1]*ry_vector[0] # Determinantalpha=-np.arctan2(det, dot)
# wrap to -pi to piifalpha>np.pi:
alpha-=2*np.piifalpha<-np.pi:
alpha+=2*np.pireturnalpha
The text was updated successfully, but these errors were encountered:
I am trying to replicate the results from table 3 of the paper, as seen below
My primary concern lies in how you convert the results to the KITTI format. Can you elaborate on this or compare to how I do it?
As you can see, when I run the code I am not fully able to get the same results, for my purpose I am using the KITTI validation set but would still expect the results to be very similar. I am using the evaluation code from the MonoDETR repo, the results marked in red are the AP3D@70 and the green is AP3D@50. I am using your pretrained outdoor model, and while I realise that you use a model only trained on KITTI for table 3, i would not expect such a large gap.
As you can see my results are
My code for converting the output to KITTI is as follows:
instances_predictions.pth
is the output generated from the model, to generate the output I am using a slight modified version of your demo scriptHelper functions
The text was updated successfully, but these errors were encountered: