-
-
Notifications
You must be signed in to change notification settings - Fork 9
/
spatialindex.qmd
380 lines (271 loc) · 21.6 KB
/
spatialindex.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
```{r,echo=FALSE,message=FALSE,warning=FALSE}
library(lidR)
options(lidR.progress = F)
r3dDefaults = rgl::r3dDefaults
m = structure(c(0.921, -0.146, 0.362, 0, 0.386, 0.482, -0.787, 0,
-0.06, 0.864, 0.5, 0, 0, 0, 0, 1), .Dim = c(4L, 4L))
r3dDefaults$FOV = 50
r3dDefaults$userMatrix = m
r3dDefaults$zoom = 0.75
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
fig.align = "center")
rgl::setupKnitr(autoprint = TRUE)
```
# Spatial indexing {#sec-spatial-indexing}
Spatial indexing is a key feature for performing spatial queries over a large point cloud. Any search for points of interest in the absence of indexing would require a “sequential scan” of every point - this could take *a lot* of time. In brief, spatial indexing organizes data into a search structure that can be quickly traversed to find specific records. Some algorithms would take unreasonable amounts of time to complete without spatial indexing.
## Introduction to spatial indexes {#sec-spatial-indexing-intro}
This section presents a layman overview of how spatial indexing works. If the reader is already knowledgeable about spatial indexing, they can skip this section.
Imagine we have a lidar point cloud with 1 million points (sounds like alot... but it isn't!). We want to query all points that fall within the extent of a circle centered on the coordinates `p = c(300, 350)` with a radius (`R`) of 25 meters. This is a typical query made thousands of times per second by many algorithms including the local maximum filter (@sec-lmffw) to locate individual trees. Without spatial indexing, the method consists of computing the distance to `p` for every single point. This is called a 'sequential scan', and it means that the computation for distance and comparisons must be conducted 1,000,000 times each.
In R we can write:
```{r spi-sequential-scan}
p = c(300, 350)
R = 25
X = runif(1e4, 0, 1000)
Y = runif(1e4, 0, 1000)
query = sqrt((X - p[1])^2 + (Y - p[2])^2) < R
Xq = X[query]
Yq = Y[query]
```
```{r plot-sequential-scan, echo = FALSE, fig.width=4.4}
plot(X,Y, pch = 19, cex = 0.1, asp = 1)
points(Xq, Yq, col = "red", cex = 0.4, pch = 19)
```
This is what the `filter_poi()` function does. It a non-specialized function that enables querying points of interest (POI) based on their attribute values (including non-spatial queries such as `Intensity > x`).
``` r
filter_poi(las, sqrt((X - p[1])^2 + (Y - p[2])^2) < R)
```
Now imagine we want to perform 1,000,000 queries like that for 1,000,000 different points. That translates to 1,000,000 x 1,000,000 = 1 **billion** operations. This does not scale-up and quickly becomes unrealistic (or at *least* dramatically slow).
With a spatial index, the points are organized in such a way that the computer does not need to perform all the comparisons. In a [quadtree](https://en.wikipedia.org/wiki/Quadtree), for example, the point cloud can be subdivided in 4 quadrants that are themselves subdivided in four quadrants and so on hierarchically (see figure below).
```{r plot-quadtree-scan, echo=FALSE, fig.width=4.4}
plot(X,Y, pch = 19, cex = 0.1, asp = 1)
rect(0,0,1000,1000, border = "red", lwd = 3)
lines(c(500,500), c(0,1000), col = "red", lwd = 3)
lines(c(0,1000), c(500,500), col = "red", lwd = 3)
rect(8,8,492,492, border = "blue", lwd = 2)
rect(8+500,8,492+500,492, border = "blue", lwd = 2)
rect(8,8+500,492,492+500, border = "blue", lwd = 2)
rect(8+500,8+500,492+500,492+500, border = "blue", lwd = 2)
lines(c(250, 250),c(8,492), col = "blue", lwd = 2)
lines(c(250, 250),c(8,492)+500, col = "blue", lwd = 2)
lines(c(250, 250)+500,c(8,492), col = "blue", lwd = 2)
lines(c(250, 250)+500,c(8,492)+500, col = "blue", lwd = 2)
lines(c(8, 492),c(250, 250), col = "blue", lwd = 2)
lines(c(8, 492),c(250, 250)+500, col = "blue", lwd = 2)
lines(c(8, 492)+500,c(250, 250), col = "blue", lwd = 2)
lines(c(8, 492)+500,c(250, 250)+500, col = "blue", lwd = 2)
points(Xq, Yq, col = "red", cex = 0.5, pch = 19)
```
In this example we can immediately exclude 75% of the points (750,000 points) in 4 operations at the top level (in red). The bounding box of our query being `[275,325]x[325,375]` we know that the POIs do not belong in top-left quadrant `([0,500] x [500,1000])` nor in top-right quadrant `([500,1000] x [500,1000])` nor in bottom-right quadrant. At the second level (in blue) in 4 more operations we can exclude another 75% of the remaining points to search only in one quadrant. At this stage we can perform a sequential scan on only 1/16^th^ of the points (i.e. 62,500 points) meaning that we discarded 937,500 points in 8 operations! Consequently our query is (roughly) 16 times faster and could be even faster yet with more subdivision levels. In `lidR` a typical quadtree has 8 levels i.e. the space is subdivided in (2^8^)^2^ = 65,536 quadrants.
As we can see, spatial indexing provides a way to dramatically speed-up many common spatial queries using discs, rectangles, polygons, 2D, 3D and so on. Different types of spatial indexes exist for different purposes but in all cases the use of a spatial index is not free and comes at the cost of greater memory usage.
## Spatial indexes for LAS objects {#sec-spatial-indexing-las}
### Overview {#sec-spatial-indexing-oveview}
`lidR` makes use of spatial indexes in many functions and can choose different types of spatial indexes on-the-fly. So far, the book only presented the function `readLAS()` (see @sec-io) but the package has some variations of `readLAS()` named `readALSLAS()`, `readTLSLAS()`, `readUAVLAS()` and so on that enable the registration of a point cloud type allowing `lidR` to adequately choose the most appropriated spatial indexing method to perform a given computation as fast as possible.
As an example we can use the TLS point cloud `pine_plot.laz` from the `TreeLS` package.
First we read the file with `readLAS()`, which considers the point cloud to be ALS because `lidR` was originally designed for ALS and by legacy `readLAS()` from version \<= 3.1 behaves optimally for ALS. In the second case we use `readTLSLAS()` to inform `lidR` that this point cloud was sampled with a terrestrial device.
In the following test we can see that the computation time was reduced from \~2.5 sec to \~1.3 sec by registering the proper point type. Improvements may range from 2 to 10 times faster depending on the point cloud and the method used.
```{r bench-segment-shape}
las <- readLAS("data/pine_plot.laz", select='xyz')
tls <- readTLSLAS("data/pine_plot.laz", select='xyz')
system.time(segment_shapes(las, shp_plane(k = 15), "Coplanar"))
system.time(segment_shapes(tls, shp_plane(k = 15), "Coplanar"))
```
This works for each method that implies many sequential spatial queries.
In the following example we can observe a \~8 fold processing time reduction.
```{r bench-point-metrics}
system.time(point_metrics(las, r = 1, ~length(Z)))
system.time(point_metrics(tls, r = 1, ~length(Z)))
```
Now lets try with an ALS point cloud. We can see that it's better to read an ALS point cloud as ALS rather than as TLS (\~2 fold difference). This is because registering the correct point cloud type enables the selection of the optimal spatial indexing algorithm internally.
```{r bench-classify-noise}
als = readALSLAS("data/ENGINE/catalog/tiles_338000_5238500_1.laz")
tls = readTLSLAS("data/ENGINE/catalog/tiles_338000_5238500_1.laz")
system.time(classify_noise(als, sor()))
system.time(classify_noise(tls, sor()))
```
> **Take away**: It is always a good idea to use the functions `readALSLAS()`, `readTLSLAS()`, `readDAPLAS()`, and so on introduced in `lidR v3.1.0`.
There are however caveats resulting from using the optimal `read*LAS()` function that may not guarantee optimal processing performance.
1. All functions do not use spatial indexing or do not use the spatial index framework of `lidR`. For example, the `kriging()` function is based on the `gstat` package.
2. The choice of spatial index relies on some assumptions that may not be met in specific point clouds. The internal dispatch is designed to work with 'typical' point clouds under some assumptions. An ALS point-cloud is typically spatially large (1 km² or more) with little `Z` dispersion (0 to 40 meters) relative to the `XY` dispersion (0 to 1000 meters). On the contrary, a TLS point cloud is typically spatially narrow (\~3000 m²) with larger variations in `Z` relative to `XY`.
`read*LAS()` should be sufficient for most use cases but for some specific cases users can manually choose which spatial index is best suited. We cover this in the next section.
### Spatial indexes and selection strategies {#sec-spatial-indexing-strategies}
`lidR` currently has 4 spatial indexes: a grid partition, a voxel partition, a quadtree and an octree. Each has its own pros and cons.
Grid partition and quadtree are 2D indexes while voxel partition and octree are 3D indexes. They are all able to perform any kind of spatial query similarly. This is why it doesn't matter if the point cloud is read with `readALSLAS()` or `readTLSLAS()`. The result will be the same. However their efficiency depends on the point cloud type and the query type. This is why using the proper `read*LAS()` function can matter.
#### ALS strategies {#sec-spi-ALS-strategy}
For ALS we use a 2D index even for 3D queries.
Indeed, an ALS point cloud is 'mostly 2D' because more than **99% of the dispersion is in XY**. When querying the knn of a given point (3D query) from a 2D index the vast majority of the points are discarded on a 2D basis. The remaining sequential scan occurs only on a very tiny fraction of the data set. This is also true for a 3D index but querying a 3D spatial index is slower and thus in `lidR` our 2D indexes perform best for ALS. A grid partition is used by default because it is often faster than a quadtree because ALS points are **uniformly distributed** on `XY`.
The following example demonstrates how to manually register a spatial index and compare the computation times for a quadtree and an octree.
```{r bench-classify-noise-2, error=T}
las = readLAS("data/ENGINE/catalog/tiles_338000_5238500_1.laz", select = "xyz")
index(las) <- "quadtree"
system.time(classify_noise(las, sor()))
index(las) <- "octree"
system.time(classify_noise(las, sor()))
```
#### TLS strategies {#sec-spi-TLS-strategy}
For TLS we use a 3D index because the points are **almost evenly distributed in XYZ** and thus a 2D query does not allow for discarding a large fraction of the points - the sequential scan remains important. An octree is used because points are expected to be **not uniformly** distributed on `XYZ`.
```{r bench-classify-noise-3, error=TRUE}
file <- system.file("extdata", "pine_plot.laz", package="TreeLS")
las <- readLAS(file, select='xyz')
index(las) <- "quadtree"
system.time(classify_noise(las, sor()))
index(las) <- "octree"
system.time(classify_noise(las, sor()))
```
#### DAP and UAV strategies
For digital photogrammetry and UAV data we apply the same rules as TLS. When encountering a data set that does not follow these rules, it may be optimal to manually select a spatial index. This is the case of the data set seen in chapter @sec-pba-applications-roof, which is an ALS data set but in practice it's a small subset in which we can no longer say that more than 99% of the point dispersion is in `XY` only. In that sense it's more of a TLS *ish* point cloud. But in the meantime the points are uniformly spread on `XY` because it's actually an ALS data set. Thus making 3D queries using a 3D index most viable. Let's try it:
```{r bench-segment-shape-2}
las <- readLAS("data/chap11/building_WilliamsAZ_Urban_normalized.laz")
index(las) <- "gridpartition"
system.time(segment_shapes(las, shp_plane(k = 20), "planar", filter = ~Classification != LASGROUND))
index(las) <- "voxelpartition"
system.time(segment_shapes(las, shp_plane(k = 20), "planar", filter = ~Classification != LASGROUND))
```
We see that both tests are almost equal, and that octree is slower. But one may find limit cases where its worth it to perform manual selection and thus `lidR` allows for overwriting the default rules. More details in `help("lidR-spatial-index")`.
### C++ API {#sec-spatial-indexing-cpp-api}
For more advanced users and developers, the `lidR` spatial index framework is provided as header-only C++ classes meaning that users can link to `lidR` to develop R/C++ applications using `lidR` spatial indexes. If the reader is not comfortable with the terms C++, Rcpp, header-only, external pointer and other C++ related concepts, we understand! You can skip this section, which is dedicated to advanced users and package developers who want to develop complex and efficient tools.
For the purpose of this example we will create a function `clip_disc()` similar to `clip_circle()` available in `lidR`. `clip_circle()` performs a sequential scan and is thus not suitable to perform many queries in a loop. The function `clip_disc()` on the contrary will take advantage of spatial indexing.
There is only one C++ class to know named `SpatialIndex`. It has one constructor that accepts an `S4` class and has two public members `knn` and `lookup`.
First we can write a C++ function that returns a pointer on a `SpatialIndex`. Here we are using an external pointer because it's simple to write, and implies fewer lines of code. We can however also imagine taking advantage of Rcpp modules.
``` cpp
// [[Rcpp::depends(lidR)]]
#include <SpatialIndex.h>
using namespace Rcpp;
using namespace lidR;
// [[Rcpp::export]]
XPtr<SpatialIndex> spatial_index(S4 las) {
SpatialIndex* idx = new SpatialIndex(las);
XPtr<SpatialIndex> p(idx, true);
return p;
```
```{r compile-spatial-index, echo=FALSE}
Rcpp::sourceCpp(code = "
// [[Rcpp::depends(lidR)]]
#include <SpatialIndex.h>
using namespace Rcpp;
using namespace lidR;
// [[Rcpp::export]]
XPtr<SpatialIndex> spatial_index(S4 las) {
SpatialIndex* idx = new SpatialIndex(las);
XPtr<SpatialIndex> p(idx, true);
return p;
}")
```
Now we can instantiate a `SpatialIndex` at the R level.
```{r run-spatial-index}
las = readLAS("data/ENGINE/catalog/tiles_338000_5238500_1.laz")
index = spatial_index(las)
index
```
What has been created here is either a grid partition, a voxel partition, a quadtree or an octree depending on which `readLAS()` function was used to read the files or depending on the spatial index that was manually registered.
Then we can write the C++ side of the query.
``` cpp
// [[Rcpp::export]]
IntegerVector filter_disc_with_index(SEXP xptr, double xc, double yc, double r) {
XPtr<SpatialIndex> tree(xptr);
Circle circ(xc, yc, r);
std::vector<PointXYZ> pts;
tree->lookup(circ, pts);
IntegerVector ids(pts.size());
for(int i = 0 ; i < pts.size(); i++) { ids[i] = pts[i].id; }
return ids + 1; // C++ is 0-indexed
}
```
```{r compile-clip-disc, echo = FALSE}
Rcpp::sourceCpp(code = "
// [[Rcpp::depends(lidR)]]
#include <SpatialIndex.h>
using namespace Rcpp;
using namespace lidR;
// [[Rcpp::export]]
IntegerVector filter_disc_with_index(SEXP xptr, double xc, double yc, double r) {
XPtr<SpatialIndex> tree(xptr);
Circle circ(xc, yc, r);
std::vector<PointXYZ> pts;
tree->lookup(circ, pts);
IntegerVector ids(pts.size());
for(int i = 0 ; i < pts.size(); i++)
ids[i] = pts[i].id;
return ids + 1; // C++ is 0-indexed
}")
```
And the R side of the query
```{r define-clip-disc}
clip_disc = function(las, index, xcenter, ycenter, radius) {
ii <- filter_disc_with_index(index, xcenter, ycenter, radius)
return(las[ii])
}
```
Now we can make a query and verify that both functions return the same points.
```{r run-clip-disc}
sub1 = clip_disc(las, index, 338200, 5238585, 10)
sub2 = clip_circle(las, 338200, 5238585, 10)
sub1
sub2
```
While there is no gain with a single query because of the overhead of creating an index, it is indispensable to perform many successive queries. In the following we perform 50 queries in a loop.
```{r bench-clip-disc}
n = 50
x = runif(n, 338000, 338500)
y = runif(n, 5238500, 5239000)
system.time(for (i in 1:n) u = clip_circle(las, x[i], y[i], 10))
system.time(for (i in 1:n) u = clip_disc(las, index, x[i], y[i], 10))
```
For more functionalities one can look at the source code of [SpatialIndex](https://github.com/Jean-Romain/lidR/blob/devel/inst/include/SpatialIndex.h) where we can see there are actually 2 constructors and 5 members including 2D and 3D knn, 2D and 3D knn with maximum radius and `lookup` that is templated to allow queries within any kind of user-defined shapes. The source code of many `lidR` functions such as [lmf()](https://github.com/Jean-Romain/lidR/blob/20893dd0d737394c876bd8238b3a88ad6141ea62/src/LAS.cpp#L360) or [detect_shape()](https://github.com/Jean-Romain/lidR/blob/20893dd0d737394c876bd8238b3a88ad6141ea62/src/LAS.cpp#L649) might be useful resources as well.
``` cpp
SpatialIndex(const Rcpp::S4 las);
SpatialIndex(const Rcpp::S4 las, const std::vector<bool>& filter);
template<typename T> void lookup(T& shape, std::vector<PointXYZ>& res);
void knn(const PointXY& p, const unsigned int k, std::vector<PointXYZ>& res);
void knn(const PointXYZ& p, const unsigned int k, std::vector<PointXYZ>& res);
void knn(const PointXY& p, const unsigned int k, const double r, std::vector<PointXYZ>& res);
void knn(const PointXYZ& p, const unsigned int k, const double r, std::vector<PointXYZ>& res);
```
### Benchmark {#sec-spatial-indexing-benchmark}
`lidR`'s spatial index framework is very fast, especially when large point clouds are used. In the following we compare how fast `lidR` searches for the 10-nearest neighbours of every point in a 2.3 million point ALS point cloud relative to the `RANN`, `FANN` and `nabor` packages.
```{r bench-knn, echo = FALSE, fig.height=3, fig.width=5, fig.align="center"}
library(ggplot2)
ctg = readLAScatalog("data/ENGINE/catalog/")[1:2,]
las = readLAS(ctg, select = "xyz")
XX = lidR:::coordinates3D(las)
k = 10
res <- microbenchmark::microbenchmark(
RANN = RANN::nn2(XX, k = k),
FNN = FNN::get.knnx(XX, XX, k = k),
nabor = nabor::knn(XX, XX, k = k),
`lidR (1 core)` = lidR:::C_knn(XX$X, XX$Y, XX$X, XX$Y, k, 1L), # 1 cores
`lidR (4 cores)` = lidR:::C_knn(XX$X, XX$Y,XX$X, XX$Y, k, 4L), # 4 cores
times = 1L)
res <- data.frame(
package = res$expr,
runtime = res$time/1e9)
res = res[order(res$runtime),]
res$package = factor(res$package,levels = res$package)
ggplot(res) +
aes(y = runtime, x = package) +
geom_bar(stat = "identity") +
theme_bw() +
xlab("Package") + ylab("Time (s)")
```
We see that it is competitive with the very fast `libnabo` library but does more than `libnabo` since it also performs range queries such as point in discs, rectangles, cylinders, triangles, polygons.
We don't know any R library providing such capability to produce benchmark comparisons. Moreover, `lidR` leverages the C++ classes to allow the creation of efficient third party applications. This functionality is heavily used in the [lidRplugins](https://github.com/Jean-Romain/lidRplugins) package.
## Spatial index for LAS files {#sec-spatial-indexing-files}
Previous sections were dedicated to explaining spatial indexing for `LAS` objects i.e. point clouds read with `readLAS()` and loaded in memory. This section focuses on spatial indexing for LAS files i.e. point clouds stored in las/laz files and not (yet) loaded in memory. The problem of spatial queries at read time is the same but the solution is different because it was developed in an independent context.
Fast spatial queries are made possible by indexing the `.las` or `.laz` files with `.lax` files. A `.lax` file is a tiny file associated with a `.las` or `.laz` file that spatially indexes the points. This file type was created by Martin Isenburg in [LAStools](https://rapidlasso.com/). For a better understanding of how it works one can refer to a talk given by Martin Isenburg about [lasindex](https://rapidlasso.com/2012/12/03/lasindex-spatial-indexing-of-lidar-data/). In short it uses quadtree.
By adding `.lax` files along with your `.las`/`.laz` files it is possible to make fast 2D queries **without reading the whole file**. The best way to create a `.lax` file is to use [laxindex](https://www.cs.unc.edu/~isenburg/lastools/download/lasindex_README.txt) from [LAStools](https://rapidlasso.com/lastools/). It is a free and open-source part of LAStools. If you cannot or do not want to use LAStools the `rlas` package has a function to creates lax files but `lasindex` should be preferred.
``` r
rlas::writelax("file.las")
```
The gain is really significant and transparent for users. If you have a `.lax` file it will be used.
Here we test with 150 queries from the same indexed and a non-indexed `LAScatalog` with 400 files:
``` r
indexed = readLAScatalog("LiDAR with lax/")
noindex = readLAScatalog("LiDAR no lax/")
clip_circle(indexed, xc, yc, radius = 12)
#> 45 sec
clip_circle(noindex, xc, yc, radius = 12)
#> 4 sec
```
If the reader did not skip @sec-spatial-indexing-cpp-api they might have noticed that `clip_circle()` can use a spatial index with a `LAScatalog` but not with a `LAS`. This is because they behave very differently internally and rely on two independent mechanisms. With a `LAScatalog` it inherits the capabilities of the library used to read the files while with a `LAS` object nothing has been implemented (yet) for taking advantage of spatial indexing at the R level (but the section above provide the solution).
It's easy to guess that every `clip_something()` function can take advantage of spatial indexing with `.lax` files but the `LAScatalog` processing engine also makes heavy usage of such features. Users can significantly reduce the processing time by loading a buffer faster. Indeed loading a buffer implies spatial queries. This topic is covered by the vignette: [Speed-up the computations on a LAScatalog](https://cran.r-project.org/web/packages/lidR/vignettes/lidR-computation-speed-LAScatalog.html).