Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes
- [2024.4.23] The manuscript is now available on Arxiv 2404.13550.
- [2024.4.21] The supplementary material is uploaded to Google Drive.
- [2024.4.17] Our paper has been accepted by IJCAI 2024!
Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90$\sim$160$\times$ faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.
The environment we use is as follows:
- Python 3.10.14
- Pytorch 2.0.1 with CUDA 11.7
- Pytorch3d 0.7.5
- Torchac 0.9.3
For the convenience of reproduction, we provide three different ways to help create the environment:
conda env create -f=./environment/environment.yml
source ./environment/env_create.sh
Pointsoup Image has been uploaded at CodeWithGPU community. The required environment can be instantly built once you create an AutoDL container instance with our image I2-Multimedia-Lab/Pointsoup/Pointsoup
being selected from the community image list.
In our paper, point clouds with the coordinate range of [0, 1023] are used as input.
Example point clouds are saved in ./data/example_pc_1023/
, trained model is saved in ./model/exp/
.
First and foremost, the tmc3
is need to perform predtree coding on bone points. If the tmc3
file we provided cannot work on your platform, please refer to MPEGGroup/mpeg-pcc-tmc13 for manual building.
chmod +x ./tmc3
You can adjust the compression ratio by simply adjusting the parameter local_window_size
. In our paper, we use local_window_size
in the range of 2048~128.
python ./compress.py \
--input_glob='./data/example_pc_1023/*.ply' \
--compressed_path='./data/compressed/' \
--model_load_path='./model/exp/ckpt.pt'\
--local_window_size=200 \
--tmc_path='./tmc3'\
--verbose=True
python ./decompress.py \
--compressed_path='./data/compressed/' \
--decompressed_path='./data/decompressed/' \
--model_load_path='./model/exp/ckpt.pt'\
--tmc_path='./tmc3'\
--verbose=True
We use PccAppMetrics
for D1 PSNR calculation. You can refer to MPEGGroup/mpeg-pcc-tmc2 if the provided PccAppMetrics
file does not fit your platform.
chmod +x ./PccAppMetrics
python ./eval_PSNR.py \
--input_glob='./data/example_pc_1023/*.ply' \
--decompressed_path='./data/decompressed/' \
--pcc_metric_path='./PccAppMetrics' \
--resolution=1023
Merits:
- High Performance - SOTA efficiency on multiple large-scale benchmarks.
- Low Decoding Latency - 90~160× faster than the conventional Trisoup decoder.
- Robust Generalizability - Applicable to large-scale samples once trained on small objects.
- High Flexibility - Variable-rate control with a single neural model.
- Light Weight - Fairly small with 761k parameters (about 2.9MB).
Limitations:
-
Rate-distortion performance is inferior to G-PCC Octree codec at high bitrates (e.g., bpp>1). The surface approximation-based approaches (Pointsoup and Trisoup) seem hard to characterize accurate point positions even if given enough bitrate budget.
-
Naive outdoor LiDAR frame coding efficacy is unsatisfactory. Due to the used sampling&grouping strategy, the pointsoup is limited to point clouds with relatively uniform distributed points, such as S3DIS, ScanNet, dense point cloud map, 8iVFB (human body), Visionair (objects), etc.
If you find this work useful, please consider citing our work:
@inproceedings{ijcai2024p595,
title = {Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes},
author = {You, Kang and Liu, Kai and Yu, Li and Gao, Pan and Ding, Dandan},
booktitle = {Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, {IJCAI-24}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Kate Larson},
pages = {5380--5388},
year = {2024},
month = {8},
note = {Main Track},
doi = {10.24963/ijcai.2024/595},
url = {https://doi.org/10.24963/ijcai.2024/595},
}