关于特征编码的问题

作者你好，
我想请教一个关于特征编码的问题，我看到dataset里`cld`是完全随机选取的`cld = dpt_xyz.reshape(-1, 3)[choose, :]`，后续knn的最近邻也是依据随机的`cld`进行计算的，在特征编码的地方`p2r_emb = self.nearest_interpolation(p2r_emb, p2r_ds_nei_idx)`，`p2r_ds_nei_idx`也是完全随机选取的点，虽然存在映射关系，但是随机选取很有可能出现分布不均的情况，为什么可以直接view成`rgb_emb0`的形状，跟`rgb_emb0`进行cat呢？

1.`cld`完全随机，计算最近邻的时候，某些点最近邻的点完全有可能跟自身无关，如果它周围的点都没被随机到的话

2.如果`cld`里选取的点都比较聚集，那么`p2r_emb` 的特征就是局部的，直接view位置和信息并不对应，为什么能跟`rgb_emb0`进行cat？比如640×480图像，极端一点，`cld`随机的全是前120行的点，`p2r_emb` 对应的76800就是640×120的信息，直接view成320×240不是全都错位了吗？`rgb_emb0`下采样到320×240对应的也是全图信息，直接cat也不对吧，能帮我解释一下吗？




Dear Author,
I would like to ask a question regarding feature encoding. I noticed that in the dataset, `cld` is selected completely randomly using `cld = dpt_xyz.reshape(-1, 3)[choose, :]`, and the nearest neighbors in the subsequent KNN calculation are also computed based on this random` cld`. In the feature encoding step, `p2r_emb = self.nearest_interpolation(p2r_emb, p2r_ds_nei_idx)`—here, `p2r_ds_nei_idx` also corresponds to completely randomly selected points. Although a mapping relationship exists, random selection is highly likely to result in uneven distribution. So why can we directly use view to reshape it into the shape of `rgb_emb0` and then concatenate it with `rgb_emb0`?

1.Since `cld` is completely random, when calculating nearest neighbors, it is entirely possible that the nearest neighbor of a certain point has no relevance to the point itself—especially if none of the points around it were selected randomly.

2.If the points selected in `cld` are relatively clustered, the features of `p2r_emb` will only be local. Directly using view to reshape its position will lead to a mismatch with the actual information. Then why can it still be concatenated with `rgb_emb0`? For example, take a 640×480 image. In an extreme case, if all randomly selected points in `cld` are from the first 120 rows, the 76,800 data points corresponding to `p2r_emb` will only contain information from the 640×120 region. If we directly use view to reshape it into 320×240, the entire data will be misaligned, right? Moreover, `rgb_emb0`, after downsampling to 320×240, still contains information from the entire image. Wouldn’t direct concatenation also be incorrect in this case? Could you help explain this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

关于特征编码的问题 #105

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

关于特征编码的问题 #105

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions