Part 1 - Point Cloud Introduction

By Mohammad Sadil Khan in Computer Vision Deep Learning Point Cloud Segmentation Graph Voxel MLP

June 22, 2022

1. What is Point Cloud?

A Point Cloud is a set of points in 3D space which can represent the boundary or the whole object (including inside points). In a point cloud, the points are unordered and are not restricted by any grid which means a point cloud can be expressed in an infinite way (using translation). Each point can have 3D coordinates and feature vectors. $$ P={(X_i,F_i)}^{i=N}_{i=1}, X_i\in\mathbb{R}^3,F_i\in\mathbb{R}^d$$

2. Properties of Point Cloud in $\mathbb{R}^3$

Unordered: Unlike images or arrays, point cloud is unordered. It has no restriction to be confined within a boundary. This causes a problem for CNN type architecture to learn since CNN uses convolutional operations which requires ordered and regular array like representation of the input. Point cloud networks are generally invariant to the $N!$ number of permutations in input.
Irregularity: Points are not sampled uniformly from an image which means different objects can have dense points while others sparse [1, 2]. This sometimes causes class imbalance problems in point cloud dataset.
Connectedness: Since points are not connected like graph structure and neighbouring points contain meaningful spatial and geometry information of the object, networks must learn to pass information from points to points.

3. Point Cloud Generation

Point clouds are generated by 3D Scanners like time-of-flight sensors and depth cameras or photogrammetry software. Time-of-flight sensors use the reflected laser beams from sensors to the object to capture the surface of the object.

4. Point Cloud Sampling

Point Cloud Sampling is the method of choosing a subset of point clouds from a large point cloud set. Sampling methods were used in segmentation model to reduce the number of points for faster learning [RandLa-Net]. This is an essential step in the large-scale point cloud processing, since learning features for all the points can be time consuming. Instead, features can be learnt for small point clouds and for other points, it can be aggregated using neighboring features. There are different sampling algorithms available. Let $N$ be the number of points, $M$ is the sampled number of points chosen with $N>M$, $D$ is the maximum number of points in a 3D voxel grid ($N>>D$) and $K$ is the number of nearest neighbour($N>>K$).

$\textbf{1. Heuristic Sampling}$

Grid Sampling: In Grid Sampling, a 3D voxel grid is used over the point cloud and each occupied voxels extract one point based on averages or most frequently occurring classes. This sampling results in a uniform sample. The time complexity of the grid sampling is $O(ND)$. By averaging the points on the surface, grid sampling loses smooth boundary information.
Random Sampling: One of the simplest sampling methods, Random Sampling takes $M$ random points from a point cloud of $N$ points ($N>M$). Time complexity is $O(M)$ which makes it efficient to use in large-scale point cloud networks.
Farthest Point Sampling(FPS): It iteratively extracts set of points $P=\{p_1,p_2,\cdots,p_M \}$ such that $p_j$ is the farthest point from the first $j-1$ points in $P$. The time complexity is $O(M^2N)$ which makes it unsuitable for large scale point cloud processing.
Inverse Density Importance Sampling: In IDIS, density is calculated for every point by adding the distance between the point and its nearest neighbors. $$density(x)=\sum_{y\in KNN(x)} \lVert x-y \rVert_2^2$$. So $N$ points are reordered according to the inverse of the density and top $M$ points are selected which means lower density points are more likely to be chosen than high dense points. Time complexity is $O((K+N)logN)$. This sampling can control density but is sensitive to outliers and noise.

$\textbf{2. Learning Based Sampling}$

Generator Based Sampling: Generator Based Sampling(GS) learns to generate a small subset of point clouds from the original point cloud. For a point cloud set $P$ and a task $T$, GS tries to find $S \subset P$ by minimizing the objective function $f$ such that $$S^*=argmin_{S}(f(T(S))$$. It is an end-to-end trainable model. But at inference stage, it uses FPS to match subsets with original point cloud. It takes up to 20 minutes to sample 10% of $10^6$ points.
Gumbel Subset Sampling: Gumbel Subset Sampling[4] uses attention mechanism to choose a representative and task-specific subset of the point cloud. Given an input set $X_i \in \mathbb{R}^{N_i\times c}$, the task is to choose a suitable $X_{i+1} \in \mathbb{R}^{N_{i+1}\times c}, N_{i+1} \leq N_i$ and $$X_{i+1}=y\cdot softmax(WX_i^T), W \in \mathbb{R}^{N_{i+1}\times N_i}$$ It is completely end-to-end learnable and can be used in any segmentation network.

5. Bibliography

Anh Nguyen, Bac Le, 3D Point Cloud Segmentation - A Survey, 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), 2013, pp. 225-230.
Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 77-85.
Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, A. Trigoni, A. Markham, RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Jiancheng Yang, Qiang Zhang, Bingbing Ni, Linguo Li, Jinxian Liu, Mengdie Zhou, Qi Tian, Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 3318-3327. 10.1109/CVPR.2019.00344.
M. Fan. Variants of Seeded Region Growing. Image Processing, IET · June 2015
Hang Su, Subhransu Maji ,Evangelos Kalogerakis, Erik Learned-Miller. Multi-view Convolutional Neural Networks for 3D Shape Recognition. 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 945-953
Saifullahi Aminu Bello , Shangshu Yu, Cheng Wang. Review: deep learning on 3D point clouds. Remote Sensing 12, No. 11:1729.
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou. Tangent Convolutions for Dense Prediction in 3D. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhijian Liu, Haotian Tang, Yujun Lin, Song Han. Point-Voxel CNN for Efficient 3D Deep Learning. Proceedings of the 33rd International Conference on Neural Information Processing Systems 2019.
Charles R. Qi, Li (Eric) Yi, Hao Su, Leonidas J. Guibas PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 5105–5114.
H. Zhao, L. Jiang, C. -W. Fu and J. Jia PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5560-5568, doi: 10.1109/CVPR.2019.00571.