RandLaNet
Large-Scale Point Cloud Segmentation Network
By Mohammad Sadil Khan in Deep Learning Point Cloud Segmentation Encoder Decoder
May 12, 2022
1. Point Cloud
A. Introduction
A Point Cloud is a set of points in 3D space which can represent the boundary or the whole object (including inside points). In a point cloud, the points are unordered and are not restricted by any grid which means a point cloud can be expressed in an infinite way (using translation). Each point can have 3D coordinates and feature vectors.
B. Properties of Point Cloud in
Unlike images or arrays, point cloud is unordered. It has no restriction to be confined within a boundary. This causes a problem for CNN type architecture to learn since CNN uses convolutional operations which requires ordered and regular array like representation of the input. Point cloud networks are generally invariant to the number of permutations in input. Points are not sampled uniformly from an image which means different objects can have dense points while others sparse [1, 2]. This sometimes causes class imbalance problems in point cloud dataset. Since points are not connected like graph structure and neighbouring points contain meaningful spatial and geometry information of the object, networks must learn to pass information from points to points.
2. RandLaNet - Architecture
Large-scale point cloud segmentation is a challenging task because of huge computational requirements and effective embedding learning. RandLa-Net[3] is an efficient and lightweight neural architecture that segments every point in large-scale point clouds. It is an encoder-decoder-like architecture that uses random sampling to downsample the input point cloud in the encoder and upsample the point cloud in decoder blocks. It uses random sampling compared to other sampling methods because of faster computation. Although random sampling can discard key points necessary for efficient point cloud segmentation, RandLa-Net implements attention-based local feature aggregation to effectively share features of points that are removed into the neighbor points. Figure[1] is the architecture of RandLa-Net.

A. Random Sampling
Compared to other sampling methods, Random sampling is extremely fast (time complexity

B. Architecture
RandLa-Net consists of 4 encoder and 4 decoder layers (Figure 1). Each encoder layer consists of LFA modules (which is shown in the bottom panel of Figure 3). LFA modules aggregate the local features and gradually expands the receptive field to perform global feature passing. Every LFA module is followed by a random sampling step. Let the input shape be
3. RandLaNet - LFA
The Local Feature Aggregation follows a three-step message passing system. Since point cloud don't have connectivity information, LFA ensures features are shared between points. In Figure 1, the LFA module in the first encoder transforms the feature vector (

The first step in message passing system is from which points we want to pass a message to the red point in Figure 3. K-Nearest Neighbor is used to find neighbor points (blue points) which will share its features with red point . Once we choose the points, we need to generate the message to send from blue points to red point. For every point, , we will generate a message by incorporating the distance and spatial information using an MLP. This MLP will give us the desired dimension of feature vector for . There are several ways to share features from neighbor points. We can use MAX, AVG or SUM function. But the best method is use linear sum of the features , with as learnable by the model. This is the attention score. It makes sure to give more weights during aggregation to points of similar nature or belonging to the same object.

Let
Attentive pooling aggregates the set of neighboring point features
Since the point cloud is downsampled, it is necessary to expand the receptive field to preserve geometric details. Inspired by Resnet architecture, the author stacks several LSE and attentive pooling in one block before downsampling. In Figure 6, the red points observe


4. Conclusion
The main advantages of RandLa-Net are
- It is lightweight and achieves state-of-the-art results compared to existing methods. The random sampling method reduces the computation.
- The proposed attention-based Local Feature Aggregation (LFA) can expand into larger receptive fields using Local Spatial Encoding (LSE) with attentive pooling of point and neighbor features.
- The network consists of Shared MLP without any need of graph reconstruction or voxelization.
- The encoder-decoder architecture with downsampling aims to generate discriminative latent vectors using small samples which represent the objects of interest.
- The random downsampling rate can influence the performance of the model. Reducing too many points will prevent the model from learning rich latent representations.
- Even though RandLaNet input allows addition of other features such as intensity, gradient, etc, it fails to learn local geometrical information. It learns the average shape of the object which causes over-segmentation. For more information, Thesis Report. ( Look at the Modified RandLa-Net with Feature Extractor and Voxel Segmentation Results)
5. Bibliography
Anh Nguyen, Bac Le, 3D Point Cloud Segmentation - A Survey, 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), 2013, pp. 225-230.
Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 77-85.
Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, A. Trigoni, A. Markham, RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).