-
Notifications
You must be signed in to change notification settings - Fork 73
Description
作者你好,
我想请教一个关于特征编码的问题,我看到dataset里cld是完全随机选取的cld = dpt_xyz.reshape(-1, 3)[choose, :],后续knn的最近邻也是依据随机的cld进行计算的,在特征编码的地方p2r_emb = self.nearest_interpolation(p2r_emb, p2r_ds_nei_idx),p2r_ds_nei_idx也是完全随机选取的点,虽然存在映射关系,但是随机选取很有可能出现分布不均的情况,为什么可以直接view成rgb_emb0的形状,跟rgb_emb0进行cat呢?
1.cld完全随机,计算最近邻的时候,某些点最近邻的点完全有可能跟自身无关,如果它周围的点都没被随机到的话
2.如果cld里选取的点都比较聚集,那么p2r_emb 的特征就是局部的,直接view位置和信息并不对应,为什么能跟rgb_emb0进行cat?比如640×480图像,极端一点,cld随机的全是前120行的点,p2r_emb 对应的76800就是640×120的信息,直接view成320×240不是全都错位了吗?rgb_emb0下采样到320×240对应的也是全图信息,直接cat也不对吧,能帮我解释一下吗?
Dear Author,
I would like to ask a question regarding feature encoding. I noticed that in the dataset, cld is selected completely randomly using cld = dpt_xyz.reshape(-1, 3)[choose, :], and the nearest neighbors in the subsequent KNN calculation are also computed based on this random cld. In the feature encoding step, p2r_emb = self.nearest_interpolation(p2r_emb, p2r_ds_nei_idx)—here, p2r_ds_nei_idx also corresponds to completely randomly selected points. Although a mapping relationship exists, random selection is highly likely to result in uneven distribution. So why can we directly use view to reshape it into the shape of rgb_emb0 and then concatenate it with rgb_emb0?
1.Since cld is completely random, when calculating nearest neighbors, it is entirely possible that the nearest neighbor of a certain point has no relevance to the point itself—especially if none of the points around it were selected randomly.
2.If the points selected in cld are relatively clustered, the features of p2r_emb will only be local. Directly using view to reshape its position will lead to a mismatch with the actual information. Then why can it still be concatenated with rgb_emb0? For example, take a 640×480 image. In an extreme case, if all randomly selected points in cld are from the first 120 rows, the 76,800 data points corresponding to p2r_emb will only contain information from the 640×120 region. If we directly use view to reshape it into 320×240, the entire data will be misaligned, right? Moreover, rgb_emb0, after downsampling to 320×240, still contains information from the entire image. Wouldn’t direct concatenation also be incorrect in this case? Could you help explain this?