Part 2 - Point Cloud Segmentation
By Mohammad Sadil Khan in Computer Vision Deep Learning Point Cloud Segmentation Graph Voxel MLP
June 28, 2022
Point Cloud Segmentation Methods
Point Cloud Segmentation is the task for grouping objects or assigning labels to every points in the point cloud. It is one of the most challenging tasks and a research topic in deep learning since point clouds are noisy, unstructured and lack connectedness property. All the methods are categorized into four categories.
1. Edge Based Methods
Edges describe the intrinsic characteristics of the boundary of any 3D object. Edge-based methods locate the points which have rapid changes in the neighborhood. Bhanu[1] proposed three approaches for detecting the edges of a 3D object. The first approach is calculating the gradient. Let
For flat surfaces these values are zero, positive when edges are convex and negative when edges are concave. The maximum magnitude of gradient is
2. Region Based Methods
Region-based methods use the idea of neighborhood information to group points that are similar thus finding similarly grouped 3D objects and maximizing the dissimilarity between different objects. Compared to edge-based methods, these methods are not susceptible to noise and outliers but they suffer from inaccurate border segmentation. There are two types of region-based methods.
- Seeded-region Methods(bottom up): Seeded region segmentation is a fast, effective and very robust image segmentation method. It starts the segmentation process by choosing manually or automatically in preprocessing step, a set of seeds which can be a pixel or a set of pixels and then gradually adding neighbouring points if certain conditions satisfy regarding similarity[1,5]. The process finishes when every point belongs to a region.
Suppose there are N seeds chosen initially. Let
be the set of seeds. Let T be the set of pixels that are not in any but is adjacent to at least a point in . where is the neighbourhood points of x. At each step if , then x is added into the region if certain conditions are met. One such condition can be checking the difference between intensity value of with the average intensity value of . The region with minimum difference is assigned to the point. There are another method when greyvalues of any point is approximated by fitting a line i.e if a coordinate of any pixel/point is , then greyvalue of , , where is the error term. The new homogeneity condition is to find the minimum distance between average approximated greyvalue and the approximated greyvalue of . Seeded-based segmentation is very much dependent upon the choice of seed points. Inaccurate choices often lead to under-segmentation or over-segmentation. - Unseeded-region Methods(top-down): Unlike seeded-based methods, unseeded methods have a top-down approach. The segmentation starts with grouping all the points into one region. Then the difference between all the mean point values and chosen point value is calculated. If it is more than the threshold then the point is kept otherwise the point is different than the rest of the points and a new region is created and the point is added into the new region and removed from the old region. The challenges are over-segmentation and domain-knowledge which is not present in complex scenes[1].
3. Attribute Based Methods
Attribute-based methods use the idea of clustering. The approach is to calculate attributes for points and then use a clustering algorithm to perform segmentation. The challenges in these methods are how to find a suitable attribute that contains the necessary information for segmentation and to define proper distance metrics. Some of the attributes can be normal vectors, distance, point density, or surface texture measures. It is a very robust method but performs poorly if points are large-scale and attributes are multidimensional.[1]
4. Deep Learning Based Methods
The main challenge in point cloud segmentation is find good latent vector which can contain sufficient information for segmentation task. Deep Learning methods offers the best solution to learn good representations. Neural networks being a universal approximator can theoretically approximate the target function for segmentation. The following theorem justifies how MLPs can approximate the function for the segmentation task given enough neurons.
Theorem 1: Given a set of point clouds
Proof: The hausdorff distance is defined by
The DL methods for point cloud segmentation can be divided into following ways.
A. Projection-Based Networks
Following the success of 2d CNNs, projection-based networks use the projection of 3D point clouds into 2d images from various views/angles. Then 2D CNN techniques are applied to it to learn feature representations and finally features are aggregated with multi-view information for final output [6,7]. In [8], tangent convolutions are used. For every point, tangent planes are calculated and tangent convolutions are based on the projection of local surface geometry on the tangent plane. This gives a tangent image which is an
B. Voxel-Based Networks
Voxel-based methods convert the 3D point clouds into voxel-based images. Figure [1] shows an example. The points which make up the point cloud are unstructured and unordered but CNN requires a regular grid for convolution operation.

Voxelization is done in the following steps.
- A bounding box of the point cloud is calculated which defines the entire space that is to be divided.
- Then the space is divided into a fixed-size grid. Each grid is called 3D cuboids.
- The point cloud is divided into different grids with each 3D cuboid containing several points and these 3D cuboids become voxels that represent the subset of points.
- Features are calculated from the subset of points inside a voxel.

C. Point-Based Networks
Point-Based Networks work on raw point cloud data. They do not require voxelization or projection. PointNet is a breakthrough network that takes input as raw point clouds and outputs labels for every point. It uses permutation-invariant operations like pointwise MLP and symmetric layer, Max-Pooling layer for feature aggregation layer. It achieves state-of-the-art performance on benchmark datasets. But PointNet lacks local dependency information and so it does not capture local information. The max-pooling layer captures the global structure and loses distinct local information. Inspired by PointNet many new networks are proposed to learn local structure. PointNet++ extends the PointNet architecture with an addition of local structure learning method. The local structure information passing idea follows the three basic steps (1) Sampling (2) Grouping (3) Feature Aggregation Layer (Section 3.3.1.E lists some Feature Aggregation functions) to aggregate the information from the points in the nearest neighbors. Sampling is choosing
D. Graph-Based Networks
A point cloud is unstructured, unordered, and has no connectivity properties. But it can be transformed into a graph structure by adding edges to the neighbors. Graph structures are good for modeling correlation and dependency amidst points through edges. GNN based networks use the idea of graph construction, local structure learning using expanding receptive field, and global summary structure. PointGNN creates a graph structure using KNN and applies pointwise MLP on them followed by feature aggregation. It updates vertex features along with edge features at every iteration. Landrieu introduces super point graphs to segment large-scale point clouds. It first creates a partition of geometrically similar objects (i.e planes, cubes) in an unsupervised manner and applies graph convolutions for contextual segmentation. DGCNN introduces the Edge Convolution operation for dynamically updating the vertices and edges thus updating the graph itself. Ma creates a Point Global Context Reasoning module to capture the global contextual information from the output of any segmentation network by creating a graph from the output embedding vectors.
E. Transformer and Attention-Based Networks
Transformers and attention mechanism are a major breakthrough in NLP tasks. This has lead to research in attention mechanism in 2D CNN\cite{attention}. Attention follows the following derivation.
( denotes concatenation)
5. Bibliography
Anh Nguyen, Bac Le, 3D Point Cloud Segmentation - A Survey, 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), 2013, pp. 225-230.
Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 77-85.
Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, A. Trigoni, A. Markham, RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Jiancheng Yang, Qiang Zhang, Bingbing Ni, Linguo Li, Jinxian Liu, Mengdie Zhou, Qi Tian, Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 3318-3327. 10.1109/CVPR.2019.00344.
M. Fan. Variants of Seeded Region Growing. Image Processing, IET · June 2015
Hang Su, Subhransu Maji ,Evangelos Kalogerakis, Erik Learned-Miller. Multi-view Convolutional Neural Networks for 3D Shape Recognition. 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 945-953
Saifullahi Aminu Bello , Shangshu Yu, Cheng Wang. Review: deep learning on 3D point clouds. Remote Sensing 12, No. 11:1729.
Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, Qian-Yi Zhou. Tangent Convolutions for Dense Prediction in 3D. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhijian Liu, Haotian Tang, Yujun Lin, Song Han. Point-Voxel CNN for Efficient 3D Deep Learning. Proceedings of the 33rd International Conference on Neural Information Processing Systems 2019.
Charles R. Qi, Li (Eric) Yi, Hao Su, Leonidas J. Guibas PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 5105–5114.
H. Zhao, L. Jiang, C. -W. Fu and J. Jia PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5560-5568, doi: 10.1109/CVPR.2019.00571.
Shi, Weijing and Rajkumar, Ragunathan (Raj) Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Landrieu, Loic and Simonovsky, Martin Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Wang, Yue and Sun, Yongbin and Liu, Ziwei and Sarma, Sanjay E. and Bronstein, Michael M. and Solomon, Justin M. Dynamic Graph CNN for Learning on Point Clouds. ACM Transactions on Graphics (TOG).
Ma, Yanni and Guo, Yulan and Liu, Hao and Lei, Yinjie and Wen, Gongjian Global Context Reasoning for Semantic Segmentation of 3D Point Clouds.. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
Zhao, Hengshuang and Jiang, Li and Jia, Jiaya and Torr, Philip HS and Koltun, Vladlen Point transformer.. Proceedings of the IEEE/CVF International Conference on Computer Vision.
Jianhui Yu, Chaoyi Zhang, Heng Wang, Dingxin Zhang, Yang Song, Tiange Xiang, Dongnan Liu, Weidong Cai 3D Medical Point Transformer: Introducing Convolution to Attention Networks for Medical Point Cloud Analysis.. arXiv - CS - Computer Vision and Pattern Recognition (IF), Pub Date : 2021-12-09, DOI: arxiv-2112.04863.
Mingtao Feng, Liang Zhang, Xuefei Lin, Syed Zulqarnain Gilani, Ajmal Mian Point Attention Network for Semantic Segmentation of 3D Point Clouds..